Sun Grid Engine

From Biowiki
Jump to: navigation, search

---

Sun Grid Engine Installation and Configuration Guide (and Some Notes on Our Setup)

DISCLAIMER

This is the page for installing and configuring Sun Grid Engine for the first time. It also has some notes about our general setup on the Babylon Cluster. For notes on using or administering SGE, go here:

How To Use Sun Grid Engine

How To Administer Sun Grid Engine

A SHORTCUT - if you just want to add new exec node(s) to an existing SGE setup

If you already have an SGE master set up and are just looking to add an exec node (or many), there is a much terser procedure written up here: Installing Cent OSOn Cluster Via NFS (look at the "Configure SGE" section at the bottom).

THE GOAL

The goal is to install Sun Grid Engine 6.0 update 7, obtained from the Sun Grid Engine open source project site on our Babylon Cluster (which I nicknamed BABYLON) so that we have a batch job submission system - the user just submits a bunch of jobs to SGE on the head node, and SGE takes care of scheduling and dispatching them to the computation nodes.

The problem is that the Sun documentation for how to install and configure SGE is not very clear on some things, so I will describe here, for the sake of future generations, how to get it to work on our setup (which is somewhat different from how Sun intended it, but works nevertheless). Please note that you should still read the installation instructions from Sun to get a more complete picture of the installation process, and because this writeup assumes you have at least tried to familiarize yourself with it somewhat. Also please note that this works for our setup only and your mileage may vary.

If you agree, disagree, want to comment on this writeup, etc., please drop Andrew Uzilov a line.

OUR SETUP

We have one machine used as the head node, whose task is to schedule and dispatch jobs to other nodes, and from which jobs can be submitted and administered. This is called the Master Host, the Administration Host, and the Submit Host in the SGE documentation, and will be the only node in our cluster to have these designations (SGE allows you to have these three tasks performed by different nodes, but we will just make a single node do all this work). This node will not do any computations itself. Note that we do not have a Shadow Master Host, so if our primary Master Host (head node) goes down, we are screwed, as there is no backup node to kick in and handle job scheduling and execution. TODO: Obviously this is a problem, so I will add a Shadow Master Host soon.

We have 10 machines used to execute the batch jobs (I will refer to these as exec node 1 through exec node 10) - these are called Execution Hosts in the SGE documentation, and they must also be declared as Administration Hosts on the Master Host (head node). You cannot submit or administer jobs from these machines. As a matter of fact, there is not even a reason why a user (other than the sys admin) should ever log into these machines, because all jobs should be submitted and administered on the head node and all output of those jobs should be written to our RAID array node over the NFS.

CAVEATS

If a problem occurs during the installation, SGE has a tendency to dump the install log (with the description of what went wrong) to /tmp/ - and sometimes, it won't tell you about it and just run without any error to standard out!

So, if anything seems to go wrong, look in:

ls -lt /tmp/

and see if anything was recently added there by SGE.

INSTALLATION STEP -1: OPEN HOLES IN FIREWALLS

Remember, you need to keep the SGE qmaster and execd ports (536 and 537 by default) open on all nodes. With iptables, you would open it something like this:

iptables -I RH-Firewall-1-INPUT 3 -p tcp --dport 536 -j ACCEPT

iptables -I RH-Firewall-1-INPUT 3 -p tcp --dport 537 -j ACCEPT

Note that this inserts in between other rules. Note also that this config is rather lax - the source IP and the interface are not checked, but they should be.

You must make sure to save the firewall config to disk so that it comes up again on reboot! On Cent OS, it is done like this:

service iptables save

To test if your SGE nodes can communicate with each other, it is very handy to use SGE's qping. For example, let's say I am on the master host and I want to see if I can communicate with the exec host:

qping EXEC_HOST_NAME 537 execd 1

outputs:

03/08/2007 11:51:33 endpoint EXEC_HOST_NAME/execd/1 at port 537 is up since 1582 seconds
03/08/2007 11:51:34 endpoint EXEC_HOST_NAME/execd/1 at port 537 is up since 1583 seconds
...

Hit Ctrl+C to make it stop.

Likewise, to see from the exec host if you can reach the master host:

qping MASTER_HOST_NAME 536 qmaster 1

INSTALLATION STEP 0: CREATE THE sgeadmin USER

Create a user account named sgeadmin on the head node and all the exec nodes, with the group name being sgeadmin also (although the group name is probably not important, just as long as it is the same on all the nodes). Make sure the user IDs and group IDs are the same for this account across all those nodes (this consistency actually is very important).

INSTALLATION STEP 1: PREPARE THE FILES

Obtain sge-6.0u7-common.tar.gz and sge-6.0u7-bin-lx24-amd64.tar.gz (or whatever the latest stable release is) from the Sun Grid Engine open source project site. The first file is a set of common files you will need on any platform, the second is the set of binaries pre-compiled for 64-bit Opterons.

Before any further ado, I must warn that we will do a local installation - that is, each node with SGE will have its own copy of the SGE binaries and its own local spool directory. This is to minimize NFS traffic, as the NFS will be probably used pretty intensively already for writing output of SGE jobs to the RAID node and for other things.

Go through each node on which you want to put SGE (in our case, the head node and all the exec nodes) and unpack the first file into the desired SGE directory (in our case, /opt/sge/), then the second file into the same directory. The second file may overwrite some stuff unpacked from the first file, but that is OK. This directory is referred to as the SGE root directory by the Sun documentation, and its path should be stored in $SGE_ROOT (see later). Make sure this path is the same on all nodes! I would also change ownership on the directory and all the files to sgeadmin.

Because you may have many nodes, the easiest thing is to do is (1) set up ssh-agent so that root can log into the exec nodes from the head node without a password, and (2) put the two above tarballs on an NFS that is accessible by all the exec nodes. Then, as root execute something like this from the head node (assuming you're using Bash):

$ for i in execnode1 execnode2 ... execnode10 ; do echo "LOGGING INTO ${i}..." ; ssh $i 'cd /opt/ ; mkdir sge ; cd sge ; cp /mnt/nfs/sge-6.0u7-common.tar.gz . ; tar xvfz sge-6.0u7-common.tar.gz ; cp /mnt/nfs/sge-6.0u7-bin-lx24-amd64.tar.gz . ; tar xvfz sge-6.0u7-bin-lx24-amd64.tar.gz ; chown -R sgeadmin:sgeadmin ../sge' | cat ; done

The cat at the end is so that anything written to standard error or out on the exec nodes will get written to our local console on the head node.

INSTALLATION STEP 2: PREPARE THE MASTER/SUBMIT/ADMINISTRATION HOST

Prepare the Configuration File

For this, we are going to use a configuration file based on the template in /opt/sge/util/install_modules/inst_template.conf. SGE provides automated installation scripts that will read options you set in your configuration file and perform an installation using them. Make a copy of the template and fill out the options according to the comments in the template (which are actually somewhat helpful). I would recommend putting your final configuration file on the NSF, as installation scripts for all the exec nodes will need it also.

Note that not all options in this file are going to be used - for example, we don't care about the "remove" options because those only apply during uninstallation, or the Windows, CSP or Shadow Master Host options because we are not using those in our configuration (yet). Also, since all of our nodes are on the same subnet, we can omit the Default Domain Name option. For this same reason, our cell name is default, since we just have a simple cluster. TODO: it would be nice to have Admin Mail set up. Also, the setting of SCHEDD_CONF may not be optimal, I need to play around with this.

That said, here are the contents of our config file, babylon_configuration.conf, minus the comments:

  SGE_ROOT="/opt/sge"
  SGE_QMASTER_PORT="<your favorite port here>"
  SGE_EXECD_PORT="<another favorite port here>"
  CELL_NAME="default"
  ADMIN_USER="sgeadmin"
  QMASTER_SPOOL_DIR="/opt/sge/default/spool/qmaster"
  EXECD_SPOOL_DIR="/opt/sge/default/spool/execd"
  GID_RANGE="20000-21000"
  SPOOLING_METHOD="berkeleydb"
  DB_SPOOLING_SERVER="none"
  DB_SPOOLING_DIR="/opt/sge/default/spooldb"
  ADMIN_HOST_LIST="<head node>"
  SUBMIT_HOST_LIST="<head node>"
  EXEC_HOST_LIST="<execnode1 execnode2 ... execnode10>"
  EXECD_SPOOL_DIR_LOCAL="/opt/sge/default/spool/execd"
  HOSTNAME_RESOLVING="true"
  SHELL_NAME="ssh"
  COPY_COMMAND="scp"
  DEFAULT_DOMAIN=""
  ADMIN_MAIL="none"
  ADD_TO_RC="true"
  SET_FILE_PERMS="true"
  RESCHEDULE_JOBS="wait"
  SCHEDD_CONF="2"
  # all options below are irrelevant in our setup
  SHADOW_HOST=""
  EXEC_HOST_LIST_RM=""
  REMOVE_RC="false"
  WINDOWS_SUPPORT="false"
  WIN_ADMIN_NAME="Administrator"
  WIN_DOMAIN_ACCESS="false"
  CSP_RECREATE="true"
  CSP_COPY_CERTS="false"
  CSP_COUNTRY_CODE="DE"
  CSP_STATE="Germany"
  CSP_LOCATION="Building"
  CSP_ORGA="Organisation"
  CSP_ORGA_UNIT="Organisation_unit"
  CSP_MAIL_ADDRESS="name@yourdomain.com"

Prepare to Install

First, log into the head node as root.

Second, set your $SGE_ROOT environment variable and export it:

$ export SGE_ROOT=/opt/sge

Third, add the ports you entered in the configuration file for SGE_QMASTER_PORT and SGE_EXECD_PORT (which should be two different ports not used by any service on your system) to your /etc/services file. SGE documentation recommends ports 536 and 537 (but you can use any port unoccupied by a service), so you should tack onto the end of /etc/services something like:

  # SUN GRID ENGINE
  sge_qmaster	  536/tcp								 # for Sun Grid Engine (SGE) qmaster daemon
  sge_execd		 537/tcp								 # for Sun Grid Engine (SGE) exec daemon

Naturally, communication over these ports must be unhindered by any firewall, iptables, ipchains, and other settings, otherwise the SGE daemons on the various nodes will fail to reach each other. Use qping to test this - there's a description in the "Troubleshooting" section on a different page, here. N.B.: now remember, normally the sqe_execd process/daemon/service/whatever on the exec host opens the TCP connection to the sge_qmaster process/daemon/service/whatever on the master host, which means that it is the firewall on the master host that must be open to incoming TCP connections on the sge_qmaster port. BUT... for the purposes of the installation, maybe you should make sure that the exec hosts have those ports open too, just in case. You can always figure out how to firewall without breaking SGE after you actually get it running.

Fourth (applicable only if you tried an installation already) is remove all traces of a prior installation. That is, delete your cell directory (in our case, /opt/sge/default) and make sure there's no SGE daemons running:

$ ps -ef | grep sge

which, if they are, will look something like this on the head node:

  sgeadmin 21146	  1  0 Jan23 ?		  00:00:10 /opt/sge/bin/lx24-amd64/sge_qmaster
  sgeadmin 21165	  1  0 Jan23 ?		  00:00:09 /opt/sge/bin/lx24-amd64/sge_schedd

So if you see them, kill them.

Install

Now that you've done everything above, execute the inst_sge script on the head node with the parameters -m (install Master Host, which is also the implied Submit and Administration Host), -x (install Execution Hosts), and -auto (read settings from the configuration file). In our case, this will be:

$ ./inst_sge -m -x -auto /mnt/nfs/babylon_configuration.conf

If everything went "correctly", the directory /opt/sge/default will be generated, in which some stuff will be placed, such as the BerkeleyDB info and the local spool directories. The sge_qmaster and sge_schedd daemons should also be started, so check with:

$ ps -ef | grep sge

Check the /opt/sge/default/spool/qmaster directory, where two install logs should be places - one listing the install trace for the head node, one listing the install trace for the exec nodes. Ideally, there should be no errors in the former log, except that I got this one:

  error: unrecognized characters after the attribute values in line 6: "load_adjustment_decay_time"
  error reading in file

which, as far as I can tell, has not affected anything (and I should also note I never adjusted that parameter, which is in /opt/sge/util/install_modules/inst_schedd_{normal,high,max}.conf, nor any others, prior to the installation, so it's odd this is even coming up).

The latter log file is a different story, and will most likely (if your case is anything like mine) bomb out, leaving a trace like:

  configuration <execnode1> not defined
  <execnode1>
  remote installation on host <execnode1>
  <execnode1> added to administrative host list
  configuration <execnode2> not defined
  /bin/sh: line 1: /opt/sge/default/common/settings.sh: No such file or directory
  <execnode2>
  remote installation on host <execnode2>
  <execnode2> added to administrative host list
  ...

This is OK. As far as I can tell, SGE's intetion was to log into the exec nodes using passwordless ssh (which can be set up using ssh-agent and would have saved you a lot of time in Step 1) and configure them automatically. Unfortunately, I found no way to get this to work, and to this day am left wondering if this was even an intended feature, or whether the manual makes it seem like it. Regardless, what's important is that your exec nodes have been added as Administrative Hosts on the head node, which means the head node now knows about the exec nodes, even though they don't know about the head node yet.

Additionally, a script called /opt/sge/default/common/settings.sh (there is also a C shell variant in there) will be generated. You should probably run it before you do anything else, since it sets environment variables for SGE's execution. If you ever reboot the head node, you may need to rerun that script and restart the daemons for SGE to work, although the installation script seems to imply that will be done for you automatically (as long as you installed as root).

Finally, I would do:

$ chown -R sgeadmin:sgeadmin /opt/sge

just to make sure root doesn't own anything that the user sgeadmin might need to change, but I'm not entirely sure if that is strictly necessary (couldn't hurt, though).

INSTALLATION STEP 3: PREPARE THE EXECUTION HOSTS

For this, we are going to use the same configuration file as in Step 2, so if you haven't prepared that, do so and put it on the NFS where all the exec nodes can read it (for us, it will be /mnt/nfs/babylon_configuration.conf). Then, log into the head node after you finished installing SGE on there as described in Step 2, make a tarball of the entire /opt/sge/default directory and also put it on the NFS.

While this doesn't seem to be strictly required for the exec nodes, it may be helpful to add the TCP ports used by SGE to their /etc/services files. This is simple if you are logged in as root on a machine that has ssh-agent set up. Let's say our SGE services are in services.tail on the NFS, so that:

$ cat /mnt/nfs/services.tail

  # SUN GRID ENGINE
  sge_qmaster	  536/tcp								 # for Sun Grid Engine (SGE) qmaster daemon
  sge_execd		 537/tcp								 # for Sun Grid Engine (SGE) exec daemon

Now we can just use:

for i in execnode1 execnode2 ... execnode10 ; do echo "LOGGING INTO ${i}..." ; ssh $i 'cat /mnt/nfs/services.tail >> /etc/services' ; done

and all /etc/services are updated in one fell swoop.

To install an SGE exec node, you must be logged in as root on that node (because you need to launch the sge_execd daemon as root to bind sockets with port numbers less than 1024... unless you made your SGE port numbers greater than that, in which case, more power to you). Make sure that the exec node has the /opt/sge directory and files as we prepared in Step 1. Copy and unpack the /opt/sge/default tarball into that directory. Now, run:

$ ./inst_sge -x -noremote -auto /mnt/nfs/babylon_configuration.conf

which will install the exec node using parameters from our configuration file that is on the NFS. After completion of this, the sge_execd daemon should be running and an install log should be present in /opt/sge/default/spool/qmaster of the exec node telling you everything went well (or if it didn't). The node can now be used as an exec node! The -noremote option prevents the script from logging into the other exec nodes specified in the configuration script and trying to configure them, since that seems to be failing for us anyway.

If the daemons aren't running, look in /tmp/, because that is where SGE will dump an install log that tells you what went wrong. If an install screws up, the log will be in /tmp/. If it is successful, the log will be in /opt/sge/default/spool/qmaster/.

Additionally, I would run /opt/sge/default/common/settings.sh (assuming you're in a Bash shell), but this doesn't seem to be necessary. Another step that may not be necessary but that I would do just in case it to chown the /opt/sge/default directory and its contents to sgeadmin as the owner and group.

Of course, you can use the magic of ssh-agent to automate the entire process! So, for example, after I configured the head node and made sure all the exec node /etc/services files are updated with the SGE port info, I would accomplish everything I said above by running this from the head node as root:

$ for i in execnode1 execnode 2 ... execnode10 ; do echo "LOGGING INTO ${i}..." ; ssh $i 'cd /opt/sge ; cp /mnt/nfs/default.tar . ; tar xvfz default.tar ; ./inst_sge -x -noremote -auto /mnt/nfs/babylon_configuration.conf ; chown -R sgeadmin:sgeadmin default ; rm default.tar ; . /default/common/settings.sh ; cat default/spool/qmaster/install_${HOSTNAME}* ; ps -ef | grep sge' | cat > install_log.$i ; done

Note that this creates a local install log containing everything that got written to standard error and out, the contents of the exec hosts install log, and a check to see if the sge_execd daemon is running. Also note that, personally, I kept getting this error:

  ./util/install_modules/inst_execd.sh: line 239: [: : integer expression expected

but that didn't seem to affect anything.

INSTALLATION STEP 4: TEST YOUR INSTALLATION

Obviously, all the daemons should be running: sge_qmaster and sge_schedd on the head node and sge_execd on the exec nodes - we should have made sure of that in Steps 2 and 3. Now, let's log onto the head node as sgeadmin and see if we can use the system. This is easy and fun to do using one of the sample scripts provided with SGE. First, we should run:

$ . /opt/sge/default/common/settings.sh

so that the appropriate environment variables get added and so that we can invoke SGE commands such as qconf, qsub, qstat and so on from anywhere. Now, let's submit 100 simple jobs just for fun:

$ for i in $(seq 1 100) ; do qsub /opt/sge/examples/jobs/simple.sh ; done

and then use:

$ qstat

to see the queue and which machines are executing which jobs. The sample script above just prints out the date, then waits 20 seconds and prints out the date again, so you can find the output in /home/sgeadmin of each exec node. You can check it by the magic of ssh-agent, or alternately, write your own test script that writes to the NFS, or even more alternately, use the -o and -e parameters to qsub to redirect your scripts standard out and error output to a specified path/file. Read the Sun documentation for more info - the man files should have been installed on all nodes if everything above went correctly.

If any of your exec hosts are NOT in the queue, this is problem, as they were not installed properly. Good luck with that. You might want to use qconf to make sure they are registered as Administration Hosts on the head node, and check that the daemons are running and that everything has been done consistently for all nodes.

INSTALLATION STEP 5: WHAT NOW?

You may want to change some of the SGE default settings. For example, the following shells are invoked by SGE as login shells by default: sh, ksh, csh, and tcsh. Note that bash is NOT in that list, which is annoying, because personally, I use bash alot. To fix this, run this on the master node:

$ qconf -mconf global

and edit the line "login_shells" to contain bash (or whatever shell you want... or don't want). Note these are global settings.

Issues with rebooting the node

Ideally, the SGE install process should have added things to your /etc/init.d/ and /etc/rc.d/ so that sge_qmaster, sge_execd, etc. start up correctly on reboot and so that you can use the handy service SGE_SERVICE {start,stop,restart} command in CentOS.

However, I have found that sometimes sge_execd doesn't start up as it should after reboot of exec hosts. I'm not exactly sure why, but you will want to test this! Make sure you test that SGE daemon(s) come back up after reboot! If it doesn't, you may have to hack around and launch the appropriate services in /opt/sge/bin/ in some other way.

Oddly, I have found the following "magic" to sometimes work:

  1. Install SGE exec host and reboot.
  2. Discover that SGE daemons do not start up as they should (ps aux | grep sge).
  3. Start the exec daemon manually (su - ; service sgeexecd start).
  4. Reboot (shutdown -r now).
  5. If it SGE daemons still do not start up as they should, go back to Step 3... after doing that enough times, it seems to kick in; also starting various combinations of starting/stopping/restarting with service sgeexecd stop, etc., or rebooting the node after you stopped the daemon. Witchery, I tell you!

I think all of the above applies to master hosts as well.

INSTALLATION STEP 6: SUPPORT YOUR LOCAL PUB

'nuff said.

SGE "To Do" List

NB: this is now summarized under the SGE subsection of Sys Admin Tasks, and this page will probably cease being maintained.

We are tapping only a fraction of SGE's features, but as I learn the system more, the pages (Sun Grid Engine, How To Use Sun Grid Engine, and How To Administer Sun Grid Engine) will grow. Some particular things to look at are:

  • qmake
  • adding spare machines in the lab as an SGE queue
  • scheduling and spooling optimizations
  • setting up user notification e-mails so that users are notified when their jobs encounter problems (would be very, very useful)
  • policy management (making sharing cluster resources more fair, not just on a "first come, first serve" basis as it is now)
  • installing a shadow master host
  • making the Macs in the lab submit and administartive hosts (so people don't have to log into sheridan all the time, they can just submit jobs from the Macs directly)

Of course any help in figuring stuff out is appreciated...

---

COMMENTS (QUESTIONS, NOTES, SUGGESTIONS, ETC.)

* hello... any ideas as to how to set-up sge on abaqus environment file abaqus_v6.env, i have been playing around with the settings for couple of days but, no joy... any sugestions will be much appreciated... inde. igamage@kwltd.com - TWiki Guest - 18 May 2006 13:51:05

  • thanx man... i used ur HOWTO and it worked a treat... THANX!!! - TWiki Guest - 23 Apr 2006 18:48:21
  • NB the occasional dropped job, due e.g. to head-node overload (see below), is yet another good reason to use "make" for workflow control - Ian Holmes - 01 Apr 2006 21:21:49
  • It would be cool to get qmake working... - Ian Holmes - 01 Apr 2006 21:20:04
  • I think I just broke Sun Grid Engine on sheridan (update: it now seems to be back up again, but here's what I did, just for reference). I tried submitting 4500 dummy jobs to see if it could handle the load, before trying a real analysis involving thousands of jobs. Around job 1200 it broke with "Unable to run job: failed receiving gdi request". Then I couldn't get either "qstat" or "qsub" to respond: qstat gave me a similar error, "failed receiving gdi request". After a while, qstat came back up and the job queue had cleared. These sorts of too-many-jobs issues seem to be the almost-exclusive provenance of genomics analyses, and are possibly due to overloading of the head node (I was running other stuff on sheridan at the time). See e.g. http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=13757 - Ian Holmes - 01 Apr 2006 21:17:45
  • Thanks Andrew, very .. very helpful for somebody down under trying to set it up. Note your comments on NFS :-)) s l o w !! Once I have it running may add something to user notes. May send you an email at that time. Meanwhile don't let the Andrew world implode :-)) P.S I usually visit Prof. David Brillinger in the Stats Dept when I am over your way. - TWiki Guest - 09 Mar 2006 02:23:08
  • nice Andrew! -- Ian Holmes - 26 Jan 2006 04:08:41

---

-- Started by: Andrew Uzilov - 24 Jan 2006