This describes how to install CentOS
4.2 (this probably applies to all 4.x versions) from scratch on the cluster's compute nodes - that is, the dual-CPU ones that will be the execution hosts for SunGridEngine
. This will also work for the single-CPU nodes, and even might work for the quad-CPU high memory node. Configuring the RAID node, however, is a little trickier, and I mention below which additional steps need to be taken to do it.
Please note that the RAID node now has CentOS 4.3 on it, but these instruction were written for the 4.2 installation. Shouldn't be any different though...
Step 1: Obtain the OS and Make Bootable Installation Discs
Get the ISO disc image files from a mirror run by our friends at LBL:
Burn them to CD; for example, on kosh
(at this time, the only node with a CD/DVD burner), you would use (as root):
cdrecord -v -dev=ATA:1,1,0 -data CentOS-4.2-x86_64-bin1of4.iso
cdrecord -v -dev=ATA:1,1,0 -data CentOS-4.2-x86_64-bin2of4.iso
cdrecord -v -dev=ATA:1,1,0 -data CentOS-4.2-x86_64-bin3of4.iso
cdrecord -v -dev=ATA:1,1,0 -data CentOS-4.2-x86_64-bin4of4.iso
where the -dev
parameter info is obtained from:
except that we need to prepend ATA
to the 3-digit bus address of our burner because cdrecord
fails to find the correct driver otherwise and won't work.
ADDITIONAL STEPS FOR RAID NODE ONLY:
obtain the 3ware 9550SX RAID controller drivers for CentOS 4.2 from the manufacturer
(the filename is CentOS4.2-installdiskx86_64.ZIP
, at the bottom of the page), unzip them using the unzip
command, and put them on a floppy disk. (Later note
on 6/16/06: there are also drivers for CentOS 4.3 on that same page, but when I tried installing CentOS 4.3 on the RAID recently, I just selected the generic drivers for 3ware 9xxx controllers that came with the OS... so far they've been working fine, so I don't yet know what is the better way - manufacturer drivers or CentOS drivers.)
Step 2: Install the OS
Insert Disc 1 that we burned above into the drive and reboot to start the installation. Unless you are installing a RAID, press Enter
when prompted to select an installation type (i.e. graphical or command line).
ADDITIONAL STEPS FOR RAID NODE ONLY:
if you chose to use the manufacturer drivers and have them on the floppy, hit F2
when prompted to select an installation type (i.e. graphical or command line), then type in:
at the command prompt and hit Enter
- Choose Yes button to "Do you have a driver disk";
- Select fd0, hit Enter, make sure disk is inserted, hit Enter again;
- Make sure that a message pops up that 3ware drivers are being read from floppy;
- Say No to more driver disks.
, you can use the drivers that come with CentOS (I did this for version 4.3 and they seem to work fine, don't remember if they're offered in 4.2 or not) - you will be told at some point that no drives can be found and asked to select drivers for a controller card. Choose the generic 3ware 9xxx controller card drivers.
BACK TO INSTALLATION STEPS FOR ALL NODES:
installer will ask you to test your media, which you can if you're paranoid. It will then try to detect the hardware, which it should with no problems (for motherboard, video card, hard drive, mouse and keyboard it did, at least).
You will then go to a graphical interface that will direct you through the rest of the installation. Accept the default settings everywhere, unless specifically stated otherwise below. Here are some of the non-obvious choices you will have to make (use common sense for whatever isn't described below):
- Choose Custom for the installation type.
- Choose Automatic Partitioning, then Remove all partitions on this system to make everything nice and clean. The partition table displayed afterwards should split /dev/hda into two partitions: /dev/hda1, which is mounted to /boot, and /dev/hda2, which is mounted to VolGroup00 LVM volume group. VolGroup00 should additionally be split into two parts, LogVol00 and LogVol01, mounted to / and /swap, respectively. (SPECIAL NOTE FOR RAID NODE INSTALLATION ONLY: instead of hda, it should say sda, and the LogVol00 partition size will be whatever you configured in the RAID BIOS or with tw_cli previously - see ClusterRAID for more info on that.)
- Ask me (AndrewUzilov) personally for what networking settings and firewall settings to use. I ain't putting them up on an Internet-visible page.
- When selecting what packages to install, use common sense, as it will take too long to explain here, but some things we definately do need (in the addition to the defaults already selected) are (note that you obviously will want a much terser installation on a gateway/router node, so if you're installing that, do not install the server, database, and development stuff below):
- Everything in Editors and Engineering and Scientific
- Whatever looks necessary in Server Configuration Tools (the defaults are OK)
- Everything in Web Server (this is necessary for GBrowse)
- Everything in PostgreSQL Database and MySQL Database (once again for GBrowse)
- Do not install anything in the Servers section except what's mentioned above
- Most of Developer Tools (particularly compilers and Subversion)
- Select System Tools (use your judgement) and everything in Compatibility Arch Support in the System section
- NOTE FOR RAID NODE INSTALL ONLY: formatting the RAID will take a very long time (around an hour, if not more).
Step 3: Before plugging the machine into any kind of network, set up appropriate firewalls and other safety measures
for how to do this, because I can't put it up here. After you set up the firewalls, don't forget to save the iptables
rules before rebooting! Otherwise, when you reboot the machine, it will be wide open! Save them as follows:
$ /sbin/service iptables save
which will write the configuration to the files /etc/sysconfig/iptables
that are visible to root only.
Note that you should set up your /etc/hosts.allow
files at this point, if you choose. A good starting point for /etc/hosts.deny
which is a rule that will deny access to your machine via anything using these files (or actually using TCP wrappers, but I won't get into that here...), such as sshd
, the SSH and NFS server-side daemons. Of course, no one can SSH into this machine now, unless you add rules to /etc/hosts.allow
stating what you want done, such as:
which will allow anyone to SSH in, so make sure your firewall is tight.
Lastly, you may also want to make sure that the ethX
is an integer) network devices are configured to start at boot/init time, like this:
$ /sbin/chkconfig --list network
Runlevels 2 through 5 should have the network
, otherwise you will not have a network connection when you reboot the machine. Run chkconfig
by itself to get syntax help on how to activate/deactivate what services start at what runlevels, it's simple.
The network service brings up the network interfaces using scripts in /etc/sysconfig/network-scripts/
, which you can actually hack if you want your network devices to do something special when they are activated, restarted, whatever. Remember, if you change any settings pertaining to networking in /etc/sysconfig/
, you should restart the networking service:
$ service network restart
Step 4: Perform the updates
This can be done with one simple command, run as root:
$ yum update
If there were any kernel updates, you have to reboot the machine (something that yum
won't tell you to do, but trust me... it's a good idea).
One VERY IMPORTANT THING
that you want to update from source IMMEDIATELY
(instead of RPM that yum
use, since they tend to be behind the times) is OpenSSH and OpenSSL (which also means updating zlib
). This can be a bit tricky, but is chronicled here:
Step 5: Mount the NFS
In the example below, we will mount the directory /home/
on the NFS server lorien
to the local directory /mnt/nfs/
(if you don't like this, you can mount it to any local directory, but all the cluster nodes currently have it in /mnt/nfs/
). First, make sure the directory /mnt/nfs/
exists. Then, add the following to /etc/fstab
lorien:/home/ /mnt/nfs/ nfs default 0 0
You can actually replace lorien
with a hardcoded IP address. TODO:
using the default
setting for the NFS seems to be suboptimal, I'm going to tweak this and see if we can speed the NFS over RAID up.
$ mount -a
to mount all the filesystems in /etc/fstab
, including the NFS you just added. From now on, it will mount automatically at boot/init time.
Step 6: Other things that might need to be set up
These pages contain information on how to set up things other than bare-bones CentOS. Most importantly, I would recommend going through Step 2
to set up a firewall on the freshly installed node (or a gateway... and for that matter, it covers how to set up a gateway/router, too). Even if your gateway is firewalled, it's always a good idea to have more layers of security.
These things might also be necessary:
- FOR RAID ONLY: ClusterRAID (for how to configure NFS on a RAID and how to maintain the RAID)
- 17 Feb 2006