Sun Grid Engine Examples

From Biowiki
(Redirected from SunGridEngineExamples)
Jump to: navigation, search

(back to How To Use Sun Grid Engine)

This page is for examples of how people like to run jobs on the cluster. Please post your favorite command line parameter combinations, wrappers, etc... because an example is worth a thousand words.

---

Submitting a shell script

Let's start simple... if you want to annotate an alignment using xrate (xgram, etc...), write a shell script like this:

<div id="!/bin/bash"></div>
xrate --grammar someGrammar.eg -log 6 inputAlignment.stk

Let's save this shell script as /mnt/nfs/users/JoeSample/simpleJob.bash. One minor note - the shebang (the first line telling you what shell to run the script on) is ignored. So don't count on it working, but it's good practice to put it there anyway.

Now submit that job to SGE like this:

ssh sheridan
# you should now be in your home dir, /mnt/nfs/users/JoeSample
chmod a+x simpleJob.bash
mkdir sgeLogs
qsub -cwd -o Logs/simpleJob.out -e Logs/simpleJob.err simpleJob.bash

The above will submit your shell script as a SGE job, causing xrate to annotate "inputAlignment.stk" (remember that it outputs the annotation to standard out, and the logging to standard error). You can check the job status with the qstat command.

By default, SGE will run the submitted job on your default shell (for everyone except Ian, that shell is /bin/bash). The shell script will be executed from whatever dir you submitted it from (the -cwd option sets the "current working directory").

The -e and -o options specify which files standard error and standard out will be saved to. The file paths in -e and -o will be interpreted relative to your current working directory. In this example, the xrate annotation will go to standard out, and therefore will be saved in ~/Logs/simpleJob.out. The logging info will be redirected to ~/Logs/simpleJob.err.

If you don't like Bash, you can specify a different interpreting shell for your job. For example, here is how you would submit/run your SGE job using sh:

qsub -S /bin/sh -cwd -o Logs/simpleJob.out -e Logs/simpleJob.err simpleJob.sh

The only thing that's different is the -S option. Remember, the shebang in the shell script is ignored.

CAVEAT: if you use the -S option, your job will not be running on your default shell, so you will lose the environment set up in your .bash_profile (actually, potentially all profiles, including the master /etc/profile) - so, for example, your $PATH will not be available. So, use the -S option with caution!

---

Submitting a Perl program

This is exactly like submitting a shell script, except you have to tell SGE to run the thing you submitted on the Perl interpreter, instead of the default shell. This is done using the -S option:

qsub -S /usr/bin/perl -cwd -o log.out -e log.err perlProgram.pl

CAVEAT: if you use the -S option, your job will not be running on your default shell, so you will lose the environment set up in your .bash_profile (actually, potentially all profiles, including the master /etc/profile) - so, for example, your $PATH will not be available. So, use the -S option with caution!

---

More complex (and realistic) workflow

Ian Holmes (4/25/06) says: Here is an example that shows how to do guided training. I generally use the following options for qsub

QSUB = qsub -cwd -v PATH -b y

"-cwd" means "execute in current working directory"; "-v PATH" means "use my path"; and "-b y" means "allow binary commands as well as scripts".

Here's how I'd run xgram on the cluster. This trains grammar "rind-seed" on alignment "alignment" and saves the trained grammar to a file "rind-trained". (NB these examples are all ripped from Makefiles, hence the GNU Make syntax for variables. The "$(shell pwd)" is GNU Make-ese for the current directory.)

MYPATH = $(shell pwd)
$(QSUB) -o $(MYPATH)/xgram.job.output -e $(MYPATH)/xgram.job.error
 xgram $(MYPATH)/alignment -g $(MYPATH)/rind-seed -t $(MYPATH)/rind-trained
 | fields 2 > xgram.rind.qsub.jid

The fields 2 extracts field #2 from the qsub output (actually the third field, since the fields script counts from zero). This is the job ID, and it is saved to a file "xgram.rind.qsub.jid".

Here's how to submit a perl job that takes the output of that last xgram job, does a global replace of "irrev" for "rind", and prints to standard out (which we ask SGE to redirect to a file "irrev-seed"). The hold_jid argument asks SGE to defer execution until the xgram job has finished:

$(QSUB) -hold_jid `cat xgram.rind.qsub.jid` -o $(MYPATH)/irrev-seed -e $(MYPATH)/perl.job.error
 perl -pe s/rind/irrev/g $(MYPATH)/rind-trained
 | fields 2 >perl.rind2irrev.qsub.jid

Again, we save the ID of the queued perl job, this time to a file "perl.rind2irrev.qsub.jid".

Finally, here's how to submit a second xgram job, to be executed after the perl job to finish (which itself is contingent on the completion of the first xgram job), that trains the grammar "irrev-seed" and saves to "irrev-trained":

$(QSUB) -hold_jid `cat perl.rind2irrev.qsub.jid` -o $(MYPATH)/xgram.job.output -e $(MYPATH)/xgram.job.error
 xgram $(MYPATH)/alignment -g $(MYPATH)/irrev-seed -t $(MYPATH)/irrev-trained
 | fields 2 > xgram-irrev.qsub.jid

NB out of habit, and for consistency, we save the job ID again, to "xgram-irrev.qsub.jid".

The practise of saving the job ID for explicit dependency tracking is clumsy, and qmake would be better, no doubt.

Note also that I've saved a bunch of stuff to files I don't really need.... sometimes re-using filenames (e.g. "xgram.job.output" and "xgram.job.error"). If you don't do this, SGE starts creating files with cryptic names all over the place. It seems better to pre-emptively tell it where to put the data you don't want, rather than having it pollute the filesystem like that. I guess you could also just tell it to put stuff you don't want in "/dev/null".