Installing Cent OSOn Cluster
- 1 Installing Cent OS on the Babylon Cluster
- 1.1 Step 1: Obtain the OS and Make Bootable Installation Discs
- 1.2 Step 2: Install the OS
- 1.3 Step 3: Before plugging the machine into any kind of network, set up appropriate firewalls and other safety measures
- 1.4 Step 4: Perform the updates
- 1.5 Step 5: Mount the NFS
- 1.6 Step 6: Other things that might need to be set up
Installing Cent OS on the Babylon Cluster
This describes how to install Cent OS 4.2 (this probably applies to all 4.x versions) from scratch on the cluster's compute nodes - that is, the dual-CPU ones that will be the execution hosts for Sun Grid Engine. This will also work for the single-CPU nodes, and even might work for the quad-CPU high memory node. Configuring the RAID node, however, is a little trickier, and I mention below which additional steps need to be taken to do it.
Please note that the RAID node now has CentOS 4.3 on it, but these instruction were written for the 4.2 installation. Shouldn't be any different though...
Step 1: Obtain the OS and Make Bootable Installation Discs
Get the ISO disc image files from a mirror run by our friends at LBL:
wget http://altruistic.lbl.gov/mirrors/centos/4.2/isos/x86_64/CentOS-4.2-x86_64-bin1of4.iso wget http://altruistic.lbl.gov/mirrors/centos/4.2/isos/x86_64/CentOS-4.2-x86_64-bin2of4.iso wget http://altruistic.lbl.gov/mirrors/centos/4.2/isos/x86_64/CentOS-4.2-x86_64-bin3of4.iso wget http://altruistic.lbl.gov/mirrors/centos/4.2/isos/x86_64/CentOS-4.2-x86_64-bin4of4.iso
Burn them to CD; for example, on kosh (at this time, the only node with a CD/DVD burner), you would use (as root):
cdrecord -v -dev=ATA:1,1,0 -data [[Cent OS]]-4.2-x86_64-bin1of4.iso cdrecord -v -dev=ATA:1,1,0 -data [[Cent OS]]-4.2-x86_64-bin2of4.iso cdrecord -v -dev=ATA:1,1,0 -data [[Cent OS]]-4.2-x86_64-bin3of4.iso cdrecord -v -dev=ATA:1,1,0 -data [[Cent OS]]-4.2-x86_64-bin4of4.iso
where the -dev parameter info is obtained from:
except that we need to prepend ATA to the 3-digit bus address of our burner because cdrecord fails to find the correct driver otherwise and won't work.
ADDITIONAL STEPS FOR RAID NODE ONLY: obtain the 3ware 9550SX RAID controller drivers for CentOS 4.2 from the manufacturer (the filename is CentOS4.2-installdiskx86_64.ZIP, at the bottom of the page), unzip them using the unzip command, and put them on a floppy disk. (Later note on 6/16/06: there are also drivers for CentOS 4.3 on that same page, but when I tried installing CentOS 4.3 on the RAID recently, I just selected the generic drivers for 3ware 9xxx controllers that came with the OS... so far they've been working fine, so I don't yet know what is the better way - manufacturer drivers or CentOS drivers.)
Step 2: Install the OS
Insert Disc 1 that we burned above into the drive and reboot to start the installation. Unless you are installing a RAID, press Enter when prompted to select an installation type (i.e. graphical or command line).
ADDITIONAL STEPS FOR RAID NODE ONLY: if you chose to use the manufacturer drivers and have them on the floppy, hit F2 when prompted to select an installation type (i.e. graphical or command line), then type in:
at the command prompt and hit Enter. Then:
- Choose Yes button to "Do you have a driver disk";
- Select fd0, hit Enter, make sure disk is inserted, hit Enter again;
- Make sure that a message pops up that 3ware drivers are being read from floppy;
- Say No to more driver disks.
Alternately, you can use the drivers that come with CentOS (I did this for version 4.3 and they seem to work fine, don't remember if they're offered in 4.2 or not) - you will be told at some point that no drives can be found and asked to select drivers for a controller card. Choose the generic 3ware 9xxx controller card drivers.
BACK TO INSTALLATION STEPS FOR ALL NODES:
The anaconda installer will ask you to test your media, which you can if you're paranoid. It will then try to detect the hardware, which it should with no problems (for motherboard, video card, hard drive, mouse and keyboard it did, at least).
You will then go to a graphical interface that will direct you through the rest of the installation. Accept the default settings everywhere, unless specifically stated otherwise below. Here are some of the non-obvious choices you will have to make (use common sense for whatever isn't described below):
- Choose Custom for the installation type.
- Choose Automatic Partitioning, then Remove all partitions on this system to make everything nice and clean. The partition table displayed afterwards should split /dev/hda into two partitions: /dev/hda1, which is mounted to /boot, and /dev/hda2, which is mounted to VolGroup00 LVM volume group. VolGroup00 should additionally be split into two parts, LogVol00 and LogVol01, mounted to / and /swap, respectively. (SPECIAL NOTE FOR RAID NODE INSTALLATION ONLY: instead of hda, it should say sda, and the LogVol00 partition size will be whatever you configured in the RAID BIOS or with tw_cli previously - see Cluster RAID for more info on that.)
- Ask me (Andrew Uzilov) personally for what networking settings and firewall settings to use. I ain't putting them up on an Internet-visible page.
- When selecting what packages to install, use common sense, as it will take too long to explain here, but some things we definately do need (in the addition to the defaults already selected) are (note that you obviously will want a much terser installation on a gateway/router node, so if you're installing that, do not install the server, database, and development stuff below):
- Everything in Editors and Engineering and Scientific
- Whatever looks necessary in Server Configuration Tools (the defaults are OK)
- Everything in Web Server (this is necessary for GBrowse)
- Everything in PostgreSQL Database and MySQL Database (once again for GBrowse)
- Do not install anything in the Servers section except what's mentioned above
- Most of Developer Tools (particularly compilers and Subversion)
- Select System Tools (use your judgement) and everything in Compatibility Arch Support in the System section
- NOTE FOR RAID NODE INSTALL ONLY: formatting the RAID will take a very long time (around an hour, if not more).
Step 3: Before plugging the machine into any kind of network, set up appropriate firewalls and other safety measures
See Andrew Uzilov for how to do this, because I can't put it up here. After you set up the firewalls, don't forget to save the iptables rules before rebooting! Otherwise, when you reboot the machine, it will be wide open! Save them as follows:
$ /sbin/service iptables save
which will write the configuration to the files /etc/sysconfig/iptables and /etc/sysconfig/iptables-config that are visible to root only.
Note that you should set up your /etc/hosts.allow and /etc/hosts.deny files at this point, if you choose. A good starting point for /etc/hosts.deny should be:
which is a rule that will deny access to your machine via anything using these files (or actually using TCP wrappers, but I won't get into that here...), such as sshd or nfsd, the SSH and NFS server-side daemons. Of course, no one can SSH into this machine now, unless you add rules to /etc/hosts.allow stating what you want done, such as:
which will allow anyone to SSH in, so make sure your firewall is tight.
Lastly, you may also want to make sure that the ethX (where X is an integer) network devices are configured to start at boot/init time, like this:
$ /sbin/chkconfig --list network
Runlevels 2 through 5 should have the network service on, otherwise you will not have a network connection when you reboot the machine. Run chkconfig by itself to get syntax help on how to activate/deactivate what services start at what runlevels, it's simple.
N.B.: The network service brings up the network interfaces using scripts in /etc/sysconfig/network-scripts/, which you can actually hack if you want your network devices to do something special when they are activated, restarted, whatever. Remember, if you change any settings pertaining to networking in /etc/sysconfig/, you should restart the networking service:
$ service network restart
Step 4: Perform the updates
This can be done with one simple command, run as root:
$ yum update
If there were any kernel updates, you have to reboot the machine (something that yum won't tell you to do, but trust me... it's a good idea).
One VERY IMPORTANT THING that you want to update from source IMMEDIATELY (instead of RPM that yum and up2date use, since they tend to be behind the times) is OpenSSH and OpenSSL (which also means updating zlib). This can be a bit tricky, but is chronicled here:
Step 5: Mount the NFS
In the example below, we will mount the directory /home/ on the NFS server lorien to the local directory /mnt/nfs/ (if you don't like this, you can mount it to any local directory, but all the cluster nodes currently have it in /mnt/nfs/). First, make sure the directory /mnt/nfs/ exists. Then, add the following to /etc/fstab:
lorien:/home/ /mnt/nfs/ nfs default 0 0
You can actually replace lorien with a hardcoded IP address. TODO: using the default setting for the NFS seems to be suboptimal, I'm going to tweak this and see if we can speed the NFS over RAID up.
$ mount -a
to mount all the filesystems in /etc/fstab, including the NFS you just added. From now on, it will mount automatically at boot/init time.
Step 6: Other things that might need to be set up
These pages contain information on how to set up things other than bare-bones CentOS. Most importantly, I would recommend going through Step 2 of Linux Gateway And Router to set up a firewall on the freshly installed node (or a gateway... and for that matter, it covers how to set up a gateway/router, too). Even if your gateway is firewalled, it's always a good idea to have more layers of security.
These things might also be necessary:
- JBrowse.InstallTileRendering (if you want to render tiles on the node)
- Sun Grid Engine (if you want to use Sun Grid Engine on the node)
- FOR RAID ONLY: Cluster RAID (for how to configure NFS on a RAID and how to maintain the RAID)
- FOR RAID ONLY: Legato Backup System (for how to set up Legato backups of the RAID contents)
-- Andrew Uzilov - 17 Feb 2006