https://wiki.uni-konstanz.de/xds/api.php?action=feedcontributions&user=Spothineni&feedformat=atomXDSwiki - User contributions [en]2024-03-29T09:41:38ZUser contributionsMediaWiki 1.39.6https://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3203Cluster Installation2015-06-09T16:00:53Z<p>Spothineni: /* Grid Engine Installation */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, ''forkcolspot'' and ''forkintegrate'' scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment. Observe there is ''qsub'' command which submits forkcolspot_job/forkintegrate_job to grid engine.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
#submit job to grid engine<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $itask $host >> jobs.log<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
#submit job to grid engine<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as ''qsub'' or using DRMAA http://www.drmaa.org/ C, JAVA or IDL bindings from any applications which want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed that all the workstations involved access the storage (using NFS or other cluster file systems) where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@ws1:/home 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@ws1:/home 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@ws1:/home 3> cd /usr/share/gridengine<br />
<br />
root@ws1:/home 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
Lets say ''ws1'' is ''sgemaster'' node, it will installed using install_qmaster<br />
<br />
==== Installing sgemaster ====<br />
<br />
<pre><br />
root@ws1:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
* Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be sourced.<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used in the cluster.<br />
<br />
==== Installing sge_execd ====<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@ws2:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
<br />
==== Submit nodes ====<br />
<br />
Install Grid engine rpms also on all the submit nodes which use ''qsub'', and copy the default directory /usr/share/gridengine/default. Remember all the execution nodes in the cluster also act as submit nodes in case of XDS.<br />
<br />
Use command ''qconf'' to see which are submit hosts which are not and you can add them manually.<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering files (by default) to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3202Cluster Installation2015-06-09T15:58:52Z<p>Spothineni: /* Installing sgemaster */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, ''forkcolspot'' and ''forkintegrate'' scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment. Observe there is ''qsub'' command which submits forkcolspot_job/forkintegrate_job to grid engine.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
#submit job to grid engine<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $itask $host >> jobs.log<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
#submit job to grid engine<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as ''qsub'' or using DRMAA C, JAVA or IDL bindings from any applications which want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed that all the workstations involved access the storage (using NFS or other cluster file systems) where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@ws1:/home 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@ws1:/home 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@ws1:/home 3> cd /usr/share/gridengine<br />
<br />
root@ws1:/home 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
Lets say ''ws1'' is ''sgemaster'' node, it will installed using install_qmaster<br />
<br />
==== Installing sgemaster ====<br />
<br />
<pre><br />
root@ws1:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
* Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be sourced.<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used in the cluster.<br />
<br />
==== Installing sge_execd ====<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@ws2:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
<br />
==== Submit nodes ====<br />
<br />
Install Grid engine rpms also on all the submit nodes which use ''qsub'', and copy the default directory /usr/share/gridengine/default. Remember all the execution nodes in the cluster also act as submit nodes in case of XDS.<br />
<br />
Use command ''qconf'' to see which are submit hosts which are not and you can add them manually.<br />
<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering files (by default) to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3198Cluster Installation2015-06-08T18:50:24Z<p>Spothineni: /* XDS Cluster setup */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, ''forkcolspot'' and ''forkintegrate'' scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment. Observe there is ''qsub'' command which submits forkcolspot_job/forkintegrate_job to grid engine.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
#submit job to grid engine<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $itask $host >> jobs.log<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
#submit job to grid engine<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as ''qsub'' or using DRMAA C, JAVA or IDL bindings from any applications which want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed that all the workstations involved access the storage (using NFS or other cluster file systems) where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@ws1:/home 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@ws1:/home 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@ws1:/home 3> cd /usr/share/gridengine<br />
<br />
root@ws1:/home 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
Lets say ''ws1'' is ''sgemaster'' node, it will installed using install_qmaster<br />
<br />
==== Installing sgemaster ====<br />
<br />
<pre><br />
root@ws1:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
* Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used in the cluster.<br />
<br />
==== Installing sge_execd ====<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@ws2:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
<br />
==== Submit nodes ====<br />
<br />
Install Grid engine rpms also on all the submit nodes which use ''qsub'', and copy the default directory /usr/share/gridengine/default. Remember all the execution nodes in the cluster also act as submit nodes in case of XDS.<br />
<br />
Use command ''qconf'' to see which are submit hosts which are not and you can add them manually.<br />
<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering files (by default) to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3197Cluster Installation2015-06-08T18:33:04Z<p>Spothineni: </p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, ''forkcolspot'' and ''forkintegrate'' scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment. Observe there is ''qsub'' command which submits forkcolspot_job/forkintegrate_job to grid engine.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
#submit job to grid engine<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
#submit job to grid engine<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as ''qsub'' or using DRMAA C, JAVA or IDL bindings from any applications which want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed that all the workstations involved access the storage (using NFS or other cluster file systems) where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@ws1:/home 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@ws1:/home 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@ws1:/home 3> cd /usr/share/gridengine<br />
<br />
root@ws1:/home 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
Lets say ''ws1'' is ''sgemaster'' node, it will installed using install_qmaster<br />
<br />
==== Installing sgemaster ====<br />
<br />
<pre><br />
root@ws1:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
* Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used in the cluster.<br />
<br />
==== Installing sge_execd ====<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@ws2:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
<br />
==== Submit nodes ====<br />
<br />
Install Grid engine rpms also on all the submit nodes which use ''qsub'', and copy the default directory /usr/share/gridengine/default. Remember all the execution nodes in the cluster also act as submit nodes in case of XDS.<br />
<br />
Use command ''qconf'' to see which are submit hosts which are not and you can add them manually.<br />
<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering files (by default) to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3196Cluster Installation2015-06-08T18:27:51Z<p>Spothineni: /* Grid Engine Installation */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, ''forkcolspot'' and ''forkintegrate'' scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment. Observe there is ''qsub'' command which submits forkcolspot_job/forkintegrate_job to grid engine.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
#submit job to grid engine<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
#submit job to grid engine<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as ''qsub'' or using DRMAA C, JAVA or IDL bindings from any applications which want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed that all the workstations involved access the storage (using NFS or other cluster file systems) where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@ws1:/home 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@ws1:/home 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@ws1:/home 3> cd /usr/share/gridengine<br />
<br />
root@ws1:/home 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
Lets say ''ws1'' is ''sgemaster'' node, it will installed using install_qmaster<br />
<br />
==== Installing sgemaster ====<br />
<br />
<pre><br />
root@ws1:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
* Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used in the cluster.<br />
<br />
==== Installing sge_execd ====<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@ws2:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering files (by default) to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3195Cluster Installation2015-06-08T18:21:58Z<p>Spothineni: /* XDS Cluster setup */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, ''forkcolspot'' and ''forkintegrate'' scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment. Observe there is ''qsub'' command which submits forkcolspot_job/forkintegrate_job to grid engine.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
#submit job to grid engine<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
#submit job to grid engine<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed that all the workstations involved access the storage (using NFS or other cluster file systems) where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@ws1:/home 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@ws1:/home 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@ws1:/home 3> cd /usr/share/gridengine<br />
<br />
root@ws1:/home 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
Lets say ''ws1'' is ''sgemaster'' node, it will installed using install_qmaster<br />
<br />
==== Installing sgemaster ====<br />
<br />
<pre><br />
root@ws1:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
* Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used in the cluster.<br />
<br />
==== Installing sge_execd ====<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@ws2:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering files (by default) to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3194Cluster Installation2015-06-08T18:20:58Z<p>Spothineni: /* XDS Cluster setup */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, ''forkcolspot'' and ''forkintegrate'' scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment. Observe there is ''qsub'' command which submits forkcolspot_job/forkintegrate_job to grid engine.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed that all the workstations involved access the storage (using NFS or other cluster file systems) where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@ws1:/home 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@ws1:/home 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@ws1:/home 3> cd /usr/share/gridengine<br />
<br />
root@ws1:/home 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
Lets say ''ws1'' is ''sgemaster'' node, it will installed using install_qmaster<br />
<br />
==== Installing sgemaster ====<br />
<br />
<pre><br />
root@ws1:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
* Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used in the cluster.<br />
<br />
==== Installing sge_execd ====<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@ws2:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering files (by default) to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3193Cluster Installation2015-06-08T18:14:01Z<p>Spothineni: /* XDS Cluster setup */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, ''forkcolspot'' and ''forkintegrate'' scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment. Observe there is ''qsub'' command which submits forkcolspot_job/forkintegrate_job to grid engine.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed that all the workstations involved access the storage (using NFS or other cluster file systems) where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@ws1:/home 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@ws1:/home 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@ws1:/home 3> cd /usr/share/gridengine<br />
<br />
root@ws1:/home 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
Lets say ''ws1'' is ''sgemaster'' node, it will installed using install_qmaster<br />
<br />
==== Installing sgemaster ====<br />
<br />
<pre><br />
root@ws1:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
* Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used in the cluster.<br />
<br />
==== Installing sge_execd ====<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@ws2:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering files (by default) to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3192Cluster Installation2015-06-08T18:11:44Z<p>Spothineni: /* Grid Engine Installation */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, ''forkcolspot'' and ''forkintegrate'' scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed that all the workstations involved access the storage (using NFS or other cluster file systems) where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@ws1:/home 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@ws1:/home 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@ws1:/home 3> cd /usr/share/gridengine<br />
<br />
root@ws1:/home 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
Lets say ''ws1'' is ''sgemaster'' node, it will installed using install_qmaster<br />
<br />
==== Installing sgemaster ====<br />
<br />
<pre><br />
root@ws1:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
* Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used in the cluster.<br />
<br />
==== Installing sge_execd ====<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@ws2:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering files (by default) to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3191Cluster Installation2015-06-08T18:10:04Z<p>Spothineni: /* Grid Engine Installation */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, ''forkcolspot'' and ''forkintegrate'' scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed the all the workstations involved access the storage where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@ws1:/home 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@ws1:/home 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@ws1:/home 3> cd /usr/share/gridengine<br />
<br />
root@ws1:/home 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
Lets say ''ws1'' is ''sgemaster'' node, it will installed using install_qmaster<br />
<br />
==== Installing sgemaster ====<br />
<br />
<pre><br />
root@ws1:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
* Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used in the cluster.<br />
<br />
==== Installing sge_execd ====<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@ws2:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering files (by default) to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3190Cluster Installation2015-06-08T18:09:07Z<p>Spothineni: /* Grid Engine Installation */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, ''forkcolspot'' and ''forkintegrate'' scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed the all the workstations involved access the storage where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@ws1:/home 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@ws1:/home 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@ws1:/home 3> cd /usr/share/gridengine<br />
<br />
root@ws1:/home 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
Lets say ''ws1'' is ''sgemaster'' node, it will installed using install_qmaster<br />
<br />
==== Installing sgemaster ====<br />
<br />
<pre><br />
root@ws1:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
* Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used in the cluster.<br />
<br />
==== Installing sge_execd ====<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@ws2:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering files (by default) to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3189Cluster Installation2015-06-08T17:19:27Z<p>Spothineni: /* Installing sgemaster */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, ''forkcolspot'' and ''forkintegrate'' scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed the all the workstations involved access the storage where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
<br />
On bl1upper which sgemaster node install using install_qmaster<br />
<br />
==== Installing sgemaster ====<br />
<br />
<pre><br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
* Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used in the cluster.<br />
<br />
==== Installing sge_execd ====<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering files (by default) to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3188Cluster Installation2015-06-08T17:19:00Z<p>Spothineni: /* Grid Engine Installation */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, ''forkcolspot'' and ''forkintegrate'' scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed the all the workstations involved access the storage where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
<br />
On bl1upper which sgemaster node install using install_qmaster<br />
<br />
==== Installing sgemaster ====<br />
<br />
<pre><br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
* Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used in the cluster.<br />
<br />
<br />
==== Installing sge_execd ====<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering files (by default) to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3187Cluster Installation2015-06-08T17:17:07Z<p>Spothineni: /* Grid Engine Installation */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, ''forkcolspot'' and ''forkintegrate'' scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed the all the workstations involved access the storage where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
<br />
On bl1upper which sgemaster node install using install_qmaster<br />
<br />
<pre><br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
* Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used in the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering files (by default) to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3186Cluster Installation2015-06-08T17:13:59Z<p>Spothineni: /* Son of Gridengine */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, ''forkcolspot'' and ''forkintegrate'' scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed the all the workstations involved access the storage where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
<pre><br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
* Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used in the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering files (by default) to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3185Cluster Installation2015-06-08T17:12:33Z<p>Spothineni: /* Grid Engine Installation */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, ''forkcolspot'' and ''forkintegrate'' scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed the all the workstations involved access the storage where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
<pre><br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
* Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used in the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3184Cluster Installation2015-06-08T17:10:53Z<p>Spothineni: /* Grid Engine Installation */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, ''forkcolspot'' and ''forkintegrate'' scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed the all the workstations involved access the storage where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
<pre><br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
* Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3183Cluster Installation2015-06-08T17:09:47Z<p>Spothineni: /* Grid Engine Installation */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, ''forkcolspot'' and ''forkintegrate'' scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed the all the workstations involved access the storage where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
<pre><br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
* Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH. (*)<br />
<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3182Cluster Installation2015-06-08T17:06:43Z<p>Spothineni: /* XDS Cluster setup */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, ''forkcolspot'' and ''forkintegrate'' scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed the all the workstations involved access the storage where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
<pre><br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
*Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3181Cluster Installation2015-06-08T17:06:20Z<p>Spothineni: /* XDS Cluster setup */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, ''forkcolspot'' and ''forkintegrate'' scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
'''qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &'''<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
'''qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &'''<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed the all the workstations involved access the storage where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
<pre><br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
*Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3180Cluster Installation2015-06-08T17:05:41Z<p>Spothineni: /* XDS Cluster setup */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, ''forkcolspot'' and ''forkintegrate'' scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed the all the workstations involved access the storage where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
<pre><br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
*Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3179Cluster Installation2015-06-08T17:04:46Z<p>Spothineni: /* Grid Engine Installation */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, forkcolspot and forkintegrate scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
[[File:Gridengine arch1.png]]<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed the all the workstations involved access the storage where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
<pre><br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
*Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3178Cluster Installation2015-06-08T17:04:05Z<p>Spothineni: /* Grid Engine Installation */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, forkcolspot and forkintegrate scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named ''sgemaster'' which schedules jobs to execution nodes. On each execution node a daemon named ''sge_execd'' runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
[[File:Gridengine arch1.png]]<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed the all the workstations involved access the storage where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
<pre><br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter. Following things need to be decided before installation<br />
<br />
* Admin user is root<br />
*Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
<br />
** $SGE_ROOT=/usr/share/gridengine<br />
** $SGE_QMASTER_PORT=6444<br />
** $SGE_EXECD_PORT=6445<br />
** $SGE_CELL=default<br />
<br />
* JMX MBean server not used<br />
* Spooling method used is ''classic''<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=File:Gridengine_arch1.png&diff=3177File:Gridengine arch1.png2015-06-08T17:03:34Z<p>Spothineni: </p>
<hr />
<div></div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3176Cluster Installation2015-06-08T16:53:56Z<p>Spothineni: /* Grid Engine Installation */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, forkcolspot and forkintegrate scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named sgemaster which schedules jobs to execution nodes. On each execution node a daemon named sge_execd runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh. '''It is assumed the all the workstations involved access the storage where the data is stored and authentication is done through protocols like LDAP.'''<br />
<pre><br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
<pre><br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter.<br />
<br />
Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
<br />
$SGE_ROOT=/usr/share/gridengine<br />
$SGE_QMASTER_PORT=6444<br />
$SGE_EXECD_PORT=6445<br />
$SGE_CELL=default<br />
<br />
Following things need to be decided before installation<br />
<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3175Cluster Installation2015-06-08T16:52:26Z<p>Spothineni: /* Restarting sgemaster */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, forkcolspot and forkintegrate scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named sgemaster which schedules jobs to execution nodes. On each execution node a daemon named sge_execd runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh<br />
<pre><br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
<pre><br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter.<br />
<br />
Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
<br />
$SGE_ROOT=/usr/share/gridengine<br />
$SGE_QMASTER_PORT=6444<br />
$SGE_EXECD_PORT=6445<br />
$SGE_CELL=default<br />
<br />
Following things need to be decided before installation<br />
<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
<br />
== Restarting Grid Engine ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3174Cluster Installation2015-06-08T16:51:53Z<p>Spothineni: </p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, forkcolspot and forkintegrate scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named sgemaster which schedules jobs to execution nodes. On each execution node a daemon named sge_execd runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh<br />
<pre><br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
<pre><br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter.<br />
<br />
Following important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
<br />
$SGE_ROOT=/usr/share/gridengine<br />
$SGE_QMASTER_PORT=6444<br />
$SGE_EXECD_PORT=6445<br />
$SGE_CELL=default<br />
<br />
Following things need to be decided before installation<br />
<br />
* There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
* Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. <br />
* In this installation shadow host is not used. <br />
* After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. <br />
* Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.<br />
<br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
<br />
== Restarting sgemaster ==<br />
<br />
When grid engine installed first time /etc/init.d/sgemaster and /etc/init.d/sge_execd services are automatically installed.<br />
If you want to restart sgemaster make sure all the sge_execd deamons are stoped. You can do this by following commands<br />
<pre><br />
service sge_execd stop<br />
service sgemaster stop<br />
</pre><br />
for starting<br />
<pre><br />
service sge_execd start<br />
service sgemaster start<br />
</pre><br />
When ever work stations need to be restarted make sure sgemaster work station started first. To keep the services restarted automatically during the startup make sure chkconfig is on.<br />
<pre><br />
chkconfig sgemaster on<br />
chkconfig sge_execd on<br />
</pre><br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3173Cluster Installation2015-06-08T16:43:02Z<p>Spothineni: </p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, forkcolspot and forkintegrate scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
<br />
== Grid Engine Installation ==<br />
<br />
Grid Engine consists of a master node daemon named sgemaster which schedules jobs to execution nodes. On each execution node a daemon named sge_execd runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh<br />
<pre><br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
<pre<br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter.<br />
<br />
Follwoing important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
<br />
$SGE_ROOT=/usr/share/gridengine<br />
$SGE_QMASTER_PORT=6444<br />
$SGE_EXECD_PORT=6445<br />
$SGE_CELL=default<br />
<br />
There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. In this installation shadow host is not used. After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling. <br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3172Cluster Installation2015-06-08T16:38:50Z<p>Spothineni: /* Grid Engine Installation */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
Grid Engine consists of a master node daemon named sgemaster which schedules jobs to execution nodes. On each execution node a daemon named sge_execd runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, forkcolspot and forkintegrate scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
<br />
== Grid Engine Installation ==<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh<br />
<pre><br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
<pre<br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter.<br />
<br />
Follwoing important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
<br />
$SGE_ROOT=/usr/share/gridengine<br />
$SGE_QMASTER_PORT=6444<br />
$SGE_EXECD_PORT=6445<br />
$SGE_CELL=default<br />
<br />
There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. In this installation shadow host is not used. After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling. <br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3171Cluster Installation2015-06-08T16:36:59Z<p>Spothineni: /* Grid Engine Installation */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
Grid Engine consists of a master node daemon named sgemaster which schedules jobs to execution nodes. On each execution node a daemon named sge_execd runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, forkcolspot and forkintegrate scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
<br />
== Grid Engine Installation ==<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh<br />
<pre><br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
<pre<br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter.<br />
<br />
Follwoing important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
<br />
<br />
$SGE_ROOT=/usr/share/gridengine<br />
$SGE_QMASTER_PORT=6444<br />
$SGE_EXECD_PORT=6445<br />
$SGE_CELL=default<br />
<br />
There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. In this installation shadow host is not used. After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling. <br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<pre><br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
</pre><br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3170Cluster Installation2015-06-08T16:36:03Z<p>Spothineni: /* Grid Engine Installation */</p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
Grid Engine consists of a master node daemon named sgemaster which schedules jobs to execution nodes. On each execution node a daemon named sge_execd runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, forkcolspot and forkintegrate scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
<br />
== Grid Engine Installation ==<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh<br />
<pre><br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
</pre><br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
<pre><br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
</pre><br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
<pre<br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
</pre><br />
<br />
Most of the answers are yes/no or typing enter.<br />
<br />
Follwoing important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
<br />
<br />
$SGE_ROOT=/usr/share/gridengine<br />
$SGE_QMASTER_PORT=6444<br />
$SGE_EXECD_PORT=6445<br />
$SGE_CELL=default<br />
<br />
There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. In this installation shadow host is not used. After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling. <br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<br />
[@<br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
@]<br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3169Cluster Installation2015-06-08T16:33:31Z<p>Spothineni: </p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
Grid Engine consists of a master node daemon named sgemaster which schedules jobs to execution nodes. On each execution node a daemon named sge_execd runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
== XDS Cluster setup ==<br />
<br />
In order to setup XDS in cluster mode, forkcolspot and forkintegrate scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
<br />
== Grid Engine Installation ==<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh<br />
[@<br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
@]<br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
[@<br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
@]<br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
[@<br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
@]<br />
<br />
Most of the answers are yes/no or typing enter.<br />
<br />
Follwoing important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
<br />
<br />
$SGE_ROOT=/usr/share/gridengine<br />
$SGE_QMASTER_PORT=6444<br />
$SGE_EXECD_PORT=6445<br />
$SGE_CELL=default<br />
<br />
There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. In this installation shadow host is not used. After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling. <br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<br />
[@<br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
@]<br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
== Son of Gridengine ==<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3168Cluster Installation2015-06-08T16:30:59Z<p>Spothineni: </p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
Grid Engine consists of a master node daemon named sgemaster which schedules jobs to execution nodes. On each execution node a daemon named sge_execd runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
'''XDS Cluster setup'''<br />
<br />
In order to setup XDS in cluster mode, forkcolspot and forkintegrate scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/csh<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
<br />
'''Grid Engine Installation'''<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh<br />
[@<br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
@]<br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
[@<br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
@]<br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
[@<br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
@]<br />
<br />
Most of the answers are yes/no or typing enter.<br />
<br />
Follwoing important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
<br />
<br />
$SGE_ROOT=/usr/share/gridengine<br />
$SGE_QMASTER_PORT=6444<br />
$SGE_EXECD_PORT=6445<br />
$SGE_CELL=default<br />
<br />
There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. In this installation shadow host is not used. After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling. <br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<br />
[@<br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
@]<br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
'''Son of Gridengine'''<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3167Cluster Installation2015-06-08T16:30:01Z<p>Spothineni: </p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
Grid Engine consists of a master node daemon named sgemaster which schedules jobs to execution nodes. On each execution node a daemon named sge_execd runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
'''XDS Cluster setup'''<br />
<br />
In order to setup XDS in cluster mode, forkcolspot and forkintegrate scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<pre><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</pre><br />
<br />
----<br />
<br />
<pre><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</pre><br />
<br />
----<br />
<br />
<br />
<pre><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</pre><br />
<br />
<pre><br />
#forkintegrate_job<br />
<br />
#!/bin/bash<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</pre><br />
<br />
<br />
'''Grid Engine Installation'''<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh<br />
[@<br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
@]<br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
[@<br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
@]<br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
[@<br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
@]<br />
<br />
Most of the answers are yes/no or typing enter.<br />
<br />
Follwoing important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
<br />
<br />
$SGE_ROOT=/usr/share/gridengine<br />
$SGE_QMASTER_PORT=6444<br />
$SGE_EXECD_PORT=6445<br />
$SGE_CELL=default<br />
<br />
There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. In this installation shadow host is not used. After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling. <br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<br />
[@<br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
@]<br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
'''Son of Gridengine'''<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3166Cluster Installation2015-06-08T16:25:42Z<p>Spothineni: </p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
Grid Engine consists of a master node daemon named sgemaster which schedules jobs to execution nodes. On each execution node a daemon named sge_execd runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
'''XDS Cluster setup'''<br />
<br />
In order to setup XDS in cluster mode, forkcolspot and forkintegrate scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<code><br />
<nowiki>#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*</nowiki><br />
</code><br />
<br />
----<br />
<br />
<code><br />
<nowiki>#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par</nowiki><br />
</code><br />
<br />
----<br />
<br />
<br />
<code><br />
<nowiki>#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*</nowiki><br />
</code><br />
<br />
<code><br />
<nowiki>#forkintegrate_job<br />
<br />
#!/bin/bash<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par</nowiki><br />
</code><br />
<br />
<br />
'''Grid Engine Installation'''<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh<br />
[@<br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
@]<br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
[@<br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
@]<br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
[@<br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
@]<br />
<br />
Most of the answers are yes/no or typing enter.<br />
<br />
Follwoing important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
<br />
<br />
$SGE_ROOT=/usr/share/gridengine<br />
$SGE_QMASTER_PORT=6444<br />
$SGE_EXECD_PORT=6445<br />
$SGE_CELL=default<br />
<br />
There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. In this installation shadow host is not used. After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling. <br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<br />
[@<br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
@]<br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
'''Son of Gridengine'''<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3165Cluster Installation2015-06-08T16:24:04Z<p>Spothineni: </p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
Grid Engine consists of a master node daemon named sgemaster which schedules jobs to execution nodes. On each execution node a daemon named sge_execd runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
<br />
XDS Cluster setup<br />
<br />
In order to setup XDS in cluster mode, forkcolspot and forkintegrate scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.<br />
<br />
<code><br />
#forkcolspot<br />
<br />
ntask=$1 #total number of jobs<br />
maxcpu=$2 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mcolspot' (single processor)<br />
#maxcpu>1: use 'mcolspot_par' (openmp version)<br />
<br />
pids="" #list of background process ID's<br />
itask=1<br />
echo "MAX CPU $maxcpu $image1"<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $maxcpu -gt 1 ]<br />
# then echo "$itask" | mcolspot_par &<br />
# else echo "$itask" | mcolspot &<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkcolspot_job \<br />
$itask &<br />
<br />
#else echo "$itask" | qrsh -V -cwd "mcolspot" &<br />
else echo "$itask" | mcolspot_par &<br />
fi <br />
else echo "$itask" | mcolspot & <br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mcolspot.tmp #this temporary file was generated by xds<br />
rm -rf fork*job*<br />
</code><br />
<br />
----<br />
<br />
<code><br />
#forkcolspot_job<br />
<br />
#!/bin/csh<br />
<br />
echo $1<br />
set itask=$1<br />
<br />
echo $itask | mcolspot_par<br />
</code><br />
<br />
----<br />
<br />
<br />
<code><br />
#forkintegate<br />
<br />
fframe=$1 #id number of the first image<br />
ni=$2 #number of images in the data set<br />
ntask=$3 #total number of jobs<br />
niba0=$4 #minimum number of images in a batch<br />
maxcpu=$5 #maximum number of processors used by each job<br />
#maxcpu=1: use 'mintegrate' (single processor)<br />
#maxcpu>1: use 'mintegrate_par' (openmp version)<br />
<br />
minitask=$(($ni / $ntask)) #minimum number of images in a job<br />
mtask=$(($ni % $ntask)) #number of jobs with minitask+1 images<br />
pids="" #list of background process ID's<br />
nba=0<br />
litask=0<br />
itask=1<br />
<br />
#Sudhir check for gridengine submit host<br />
submitnodes=`qconf -sh 2> /dev/null`<br />
thishost=`hostname`<br />
isgrid=0<br />
for node in $submitnodes ; do<br />
if [ "$node" == "$thishost" ]<br />
then<br />
isgrid=1<br />
echo "Grid Engine environment detected"<br />
fi<br />
done<br />
<br />
while test $itask -le $ntask<br />
do<br />
if [ $itask -gt $mtask ]<br />
then nitask=$minitask<br />
else nitask=$(($minitask + 1))<br />
fi<br />
fitask=`expr $litask + 1`<br />
litask=`expr $litask + $nitask`<br />
if [ $nitask -lt $niba0 ]<br />
then n=$nitask<br />
else n=$niba0<br />
fi<br />
if [ $n -lt 1 ]<br />
then n=1<br />
fi<br />
nbatask=$(($nitask / $n))<br />
nba=`expr $nba + $nbatask`<br />
image1=$(($fframe + $fitask - 1)) #id number of the first image<br />
<br />
if [ $maxcpu -gt 1 ]<br />
then <br />
if [ $isgrid -eq 1 ]<br />
then<br />
qsub -sync y -V -l h_rt=0:20:00 -cwd \<br />
forkintegrate_job \<br />
$image1 $nitask $itask $nbatask &<br />
#else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate" &<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate_par &<br />
fi<br />
else echo "$image1 $nitask $itask $nbatask" | mintegrate &<br />
fi<br />
pids="$pids $!" #append id of the background process just started<br />
<br />
itask=`expr $itask + 1`<br />
done<br />
trap "kill -15 $pids" 2 15 # 2:Control-C; 15:kill<br />
wait #wait for all background processes issued by this shell<br />
rm -f mintegrate.tmp #this temporary file was generated by mintegrate<br />
rm -rf fork*job*<br />
</code><br />
<br />
<code><br />
#forkintegrate_job<br />
<br />
#!/bin/bash<br />
<br />
set image1=$1<br />
set nitask=$2<br />
set itask=$3<br />
set nbatask=$4<br />
<br />
set host=`uname -a | awk '{print $2}'`<br />
echo $image1 $nitask $itask $nbatask $host >> jobs.log<br />
echo $image1 $nitask $itask $nbatask | mintegrate_par<br />
</code><br />
<br />
<br />
'''Grid Engine Installation'''<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh<br />
[@<br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
@]<br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
[@<br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
@]<br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
[@<br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
@]<br />
<br />
Most of the answers are yes/no or typing enter.<br />
<br />
Follwoing important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
<br />
<br />
$SGE_ROOT=/usr/share/gridengine<br />
$SGE_QMASTER_PORT=6444<br />
$SGE_EXECD_PORT=6445<br />
$SGE_CELL=default<br />
<br />
There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. In this installation shadow host is not used. After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling. <br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<br />
[@<br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
@]<br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
'''Son of Gridengine'''<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3164Cluster Installation2015-06-08T16:17:35Z<p>Spothineni: </p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems'^TM^' (Sun Grid Engine, SGE) and later acquired by Oracle'^TM^' and subsequently acquired by UNIVATM. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
Grid Engine consists of a master node daemon named sgemaster which schedules jobs to execution nodes. On each execution node a daemon named sge_execd runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
'''Grid Engine Installation'''<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh<br />
[@<br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
@]<br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
[@<br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
@]<br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
[@<br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
@]<br />
<br />
Most of the answers are yes/no or typing enter.<br />
<br />
Follwoing important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
<br />
<br />
$SGE_ROOT=/usr/share/gridengine<br />
$SGE_QMASTER_PORT=6444<br />
$SGE_EXECD_PORT=6445<br />
$SGE_CELL=default<br />
<br />
There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. In this installation shadow host is not used. After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling. <br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<br />
[@<br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
@]<br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
'''Son of Gridengine'''<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothinenihttps://wiki.uni-konstanz.de/xds/index.php?title=Cluster_Installation&diff=3163Cluster Installation2015-06-08T16:16:02Z<p>Spothineni: </p>
<hr />
<div>XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput. Grid Engine was developed by Sun Microsystems'^TM^' (Sun Grid Engine, SGE) and later acquired by Oracle'^TM^' and subsequently acquired by UNIVA'^TM^'. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x. There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]<br />
<br />
Grid Engine consists of a master node daemon named sgemaster which schedules jobs to execution nodes. On each execution node a daemon named sge_execd runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.<br />
<br />
'''Grid Engine Installation'''<br />
<br />
Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh<br />
[@<br />
root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon<br />
<br />
root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine<br />
<br />
gridengine-qmaster-6.2u5-10.el6.4.x86_64<br />
gridengine-qmon-6.2u5-10.el6.4.x86_64<br />
gridengine-execd-6.2u5-10.el6.4.x86_64<br />
gridengine-6.2u5-10.el6.4.x86_64<br />
@]<br />
<br />
By default gridengine installation directory /usr/share/gridengine, contents shown below.<br />
<br />
[@<br />
root@sudhir:/home/spothineni 3> cd /usr/share/gridengine<br />
<br />
root@sudhir:/home/spothineni 4> ls<br />
bin default hadoop install_execd lib my_configuration.conf qmon utilbin<br />
ckpt doc inst_sge install_qmaster mpi pvm util<br />
@]<br />
<br />
On bl1upper which qmaster node install using install_qmaster<br />
<br />
[@<br />
root@bl1upper:/usr/share/gridengine 5>./install_qmaster<br />
@]<br />
<br />
Most of the answers are yes/no or typing enter.<br />
<br />
Follwoing important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.<br />
<br />
<br />
$SGE_ROOT=/usr/share/gridengine<br />
$SGE_QMASTER_PORT=6444<br />
$SGE_EXECD_PORT=6445<br />
$SGE_CELL=default<br />
<br />
There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.<br />
Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. In this installation shadow host is not used. After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling. <br />
<br />
After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.<br />
<br />
On execution node install execution daemon using following command<br />
<br />
[@<br />
root@bl1ws1:/usr/share/gridengine 5>./install_execd<br />
@]<br />
<br />
the input is almost typing return if you already copied the 'default' directory to this node.<br />
<br />
'''Son of Gridengine'''<br />
<br />
rpms available in this link<br />
<br />
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/<br />
<br />
by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine<br />
<br />
Default shell for Son of Gridengine is /bin/sh which is /bin/bash</div>Spothineni