r8 - 20 Sep 2007 - 17:11:22 - SivaSaktiYou are here: TWiki >  Computing Web > AvailableSoftware > ScsSGE

SGE at SCS

Description

The primary job submission queue is managed by the Sun Grid Engine (SGE). It allows you to request the necessary resources and it will find and instantiate on the least loaded node(s). What follows is a quick start, which was taken from the Rock Cluster documentation and modified only slightly for the SCS environment. The original rocks documenation can be found at http://www.rocksclusters.org/roll-documentation/sge/4.2.1/. For more extensive SGE documenation checkout the official SGE web site http://gridengine.sunsource.net/documentation.html.

Examples


Serial SGE job

Batch jobs are submitted to SGE via scripts. Here is an example of a serial job script, sleep.sh. It basically executes the sleep command.

[sysadm1@frontend-0 sysadm1]$ cat sleep.sh
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#
date
sleep 10
date

Note: Entries which start with #$ will be treated as SGE options.

  • -cwd means to execute the job for the current working directory.
  • -j y means to merge the standard error stream into the standard output stream instead of having two separate error and output streams.
  • -S /bin/bash specifies the interpreting shell for this job to be the Bash shell.

To submit this serial job script, you should use the qsub command.

[sysadm1@frontend-0 sysadm1]$ qsub sleep.sh
your job 16 ("sleep.sh") has been submitted

Parallel SGE job

For a parallel MPI job, take a look at this script, Hello.sh, to run a hello world program in parallel with an executable file named Hello. Note that you need to put in the SGE variable, $NSLOTS within the job script. Here are sample scripts for using SGE on Phoenix with each of the MPI distributions, openmpi, lam, and mpich2.

Parallel SGE script for openmpi

[sysadm1@frontend-0 sysadm1]$ . /usr/common/clusters/phoenix/openmpi.sh
[sysadm1@frontend-0 sysadm1]$ cat Hello.sh
#!/bin/bash

#Run a script before and during SGE submit 
#to guarantee that your environment is set up
#for openmpi
 
. /usr/common/clusters/phoenix/openmpi.sh

# Specify all environment variables active within the qsub
# utility to be exported to the context of the job.
#$ -V

# specify openmpi with N nodes
# -pe openmpi_2 2
# -pe openmpi_4 4
# -pe openmpi_8 8

# we choose 4
#$ -pe openmpi_4 4

# use current directory
#$ -cwd

export PATH=/usr/local/openmpi/gcc/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/openmpi/gcc/lib:$LD_LIBRARY_PATH

mpirun -np $NSLOTS Hello



Also, one needs to be sure that the Hello.cc or Hello.f90 was compiled with the compiler corresponding to the distribution set used within the batch submission script. The command to submit this script is as follows,

[sysadm1@frontend-0 sysadm1]$ qsub Hello.sh
your job 17 ("Hello.sh") has been submitted

Checking job status


To monitor jobs under SGE, use the qstat command. When executed with no arguments, it will display a summarized list of jobs.

[sysadm1@frontend-0 sysadm1]$ qstat
job-ID  prior name       user         state submit/start at     queue      master  ja-task-ID
---------------------------------------------------------------------------------------------
     20     0 Hello1.sh   sysadm1      t     12/23/2003 23:22:09 frontend-0 MASTER
     21     0 Hello2.sh   sysadm1      t     12/23/2003 23:22:09 frontend-0 MASTER
     22     0 Hello3.sh   sysadm1      qw    12/23/2003 23:22:06

Use qstat -f to display a more detailed list of jobs within SGE.

You can also use qstat to query the status of a job, given it's job id. For this, you would use the -j N option where N would be the job id.

[sysadm1@frontend-0 sysadm1]$ qsub Hello.sh
your job 28 ("Hello.sh") has been submitted
[sysadm1@frontend-0 sysadm1]$ qstat -j 28
job_number:                 28
exec_file:                  job_scripts/28
submission_time:            Wed Dec 24 01:00:59 2003
owner:                      sysadm1
uid:                        502
group:                      sysadm1
gid:                        502
sge_o_home:                 /home/sysadm1
sge_o_log_name:             sysadm1
sge_o_path:                 /opt/sge/bin/glinux:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:
/opt/ganglia/bin:/opt/maui/bin:/opt/OpenPBS/bin:/opt/OpenPBS/sbin:/opt/rocks/bin:/
opt/rocks/sbin:/home/sysadm1/bin
sge_o_mail:                 /var/spool/mail/sysadm1
sge_o_shell:                /bin/bash
sge_o_workdir:              /home/sysadm1
sge_o_host:                 frontend-0
account:                    sge
cwd:                        /home/sysadm1
path_aliases:               /tmp_mnt/ * * /
merge:                      y
mail_list:                  sysadm1@frontend-0.public
notify:                     FALSE
job_name:                   Hello.sh
shell_list:                 /bin/bash
script_file:                Hello.sh
parallel environment:  mpich range: 1
scheduling info:            queue "comp-pvfs-0-1.q" dropped because it is temporarily not available
                            queue "comp-pvfs-0-2.q" dropped because it is temporarily not available
                            queue "comp-pvfs-0-0.q" dropped because it is temporarily not available

Managing a job


If you need to delete an already submitted job, you can use qdel given it's job id. Here's an example of deleting a fluent job under SGE:

[sysadm1@frontend-0 sysadm1]$ qsub fluent.sh
your job 31 ("fluent.sh") has been submitted
[sysadm1@frontend-0 sysadm1]$ qstat
job-ID  prior name       user         state submit/start at     queue      master  ja-task-ID
---------------------------------------------------------------------------------------------
     31     0 fluent.sh  sysadm1      t     12/24/2003 01:10:28 comp-pvfs- MASTER
[sysadm1@frontend-0 sysadm1]$ qdel 31
sysadm1 has registered the job 31 for deletion
[sysadm1@frontend-0 sysadm1]$ qstat
[sysadm1@frontend-0 sysadm1]$

Trouble shooting job submission problems


Phoenix queues

Open issues


Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r8 < r7 < r6 < r5 < r4 | More topic actions
 
SCS TWiki

This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback