Cluster NFS

From Biowiki
Revision as of 15:41, 4 March 2008 by Lars Barquist (talk | contribs) (Imported from TWiki)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


see also: Cluster NFSBenchmarks

The Network File System (NFS) for the Babylon Cluster

The NFS server is currently lorien, our Cluster RAID node.

Its /home/ directory should be mounted on /mnt/nfs/ of all the lab machines and the cluster Sun Grid Engine nodes, for consistency. I have recently started a trend (to save typing) of symlinking /nfs/ to /mnt/nfs/, which is now in effect on all the cluster nodes.

IH notes (3/14/2007): The directory structure on the NFS is a bit of a mess. Some projects are scattered over multiple locations (e.g. try ls -l ~yam/fly). At the top level, links to user directories are mixed up with project and source code directories (ls -l /nfs). The filesystem is a spiderweb of symlinks; reorganizing things would probably break too much to be worth it at this point, but we need to isolate the damage. For example, is there a reason why the user dirs are all linked from /nfs to /nfs/users, and will this need to happen for future users? (This seems particularly nasty if it continues.)

Also, I think that the references to giles (above) can now be deleted. (as, presumably, can the symlinks to /mnt/clusternfs)

AU adds: giles reference removed

Regarding the rest... waay back in the day, all user home directories used to be in lorien:/home/ (which is mounted on /nfs/). Then, we decided that lorien:/home/ should house "data", "projects", and "src"... so we moved all the user home directories to lorien:/home/users/, but you symlinked them to preserve backwards compatibility. Likewise, notice that things like "dart" and "db", which are now in "src" and "data" subdirs (but used to be in /nfs/), are still symlinked - once again, for backwards compatibility with the Old Way.

Yeah, I agree those symlinks are a mess, and I'd like to pull them. The directory /nfs/ should contain ONLY the following:

/nfs/data/ /nfs/projects/ /nfs/src/ /nfs/tmp/ /nfs/users/

and that's it. If you give the word, I will make it so (and announce to everyone in case anyone is still relying on this backwards compatibility - I know I'm not).

And yeah, users that have been added since we instituted this "new" dir structure (e.g. avinash, mitch) aren't using those symlinks - they are just in /nfs/users/ and that's it. So to answer your question... this symlinking will not happen for future users, and hasn't happened for quite some time.

IH mumbles: doh! my fault then. I don't mind software directories (like "dart") being symlinked, or relatively stable data directories, but users & analysis projects shouldn't be there. Let's talk about this... we should delete the extra symlinks soon.

(Followup: Deprecated Symlinks)

This doesn't totally resolve all the above issues -- the data directories are still a mess -- but the clutter in /nfs is the most important thing.

Misc. notes

How to tell what NFS version(s) and protocol(s) are available on some NFS server:

/usr/sbin/rpcinfo -p <host name or IP address> | grep nfs  # look in the "vers" and "proto" columns

Your client NFS options are given by:


If some option is unspecified, it is the default (default client settings on Debian are: NFS version 2, UDP, rsize = wsize = 1024, timeo = 7; on Mac OS X Panther they are: NFS version 3, async, I don't know the others).

On the NFS server, you can get some useful stuff with the showmount command.


OS X Automounting

I'm sure this has already been figured out, but here's the 'correct' way to do it under OS X (leopard and higher, older version use some
legacy application):
  • open /Applications/Utilities/Directory Utility
  • >Show Advanced Settings
  • Click on 'Mounts' in toolbar
  • unlock to make changes (requires root password)
  • Add the nfs mount (URL: nfs://lorien/nfs, Mount Directory: /nfs)

Now nfs will automount on boot.


NFS v3

$ su -

$ mount -t nfs lorien:/home/ /mnt/nfs/ (Linux boxes)

$ mount_nfs lorien:/home/ /mnt/nfs/ (Mac boxes)

TODO: write how to add to /etc/fstab for automatically mounting on Linux boxes (via mount -a or at startup).

TODO: add NFS optimization options... whenever I find which ones actually work.

NFS v4

Make sure that the NFS server has the fsid=0 option in /etc/exports, e.g.:


If you have just added this option, make sure to re-export by running:

exportfs -ra

The clients get mounted by:

$ mount -t nfs lorien:/ /mnt/nfs/ (Linux boxes)

Note that we do not specify the /home/ directory on lorien explicitly, but just give it the root /. The client will figure out what to mount magically because of the fsid=0 option we added in the server's /etc/exports line.


$ su -

$ umount /mnt/nfs/

or, failing that, like this:

$ umount -l /mnt/nfs/ (lazy unmount for Linux boxes, circumvents the "busy filesystem" error)

$ umount -f /mnt/nfs/ (force unmount for Mac OS X because they have no lazy unmount, circumvents... well, pretty much everything)


If you're having NFS troubles (the most common one is that your machine hangs anytime you do an ls on an NFS directory):

  • Verify you can actually reach the NFS server:

$ ping lorien

  • If you can't, this is a network problem that has nothing to do with the NFS. If you can, try unmounting and remounting the NFS as described above.
  • If that fails, verify you have permission to mount the NFS by checking to make sure your machine's or your network's IP is in /etc/hosts.allow (relevant rules/daemons/services are: portmap, lockd, mountd, rquotad, statd) and /etc/exports on the NFS server. Also see /var/log/messages to see what errors are logged when you try to mount (running tail -f /var/log/messages on the NFS server in one terminal as you try a mount in another terminal is useful for watching a "live" message log).
  • Check to make sure permissions in /etc/exports are loaded correctly:

$ su -

$ exportfs

  • If the permissions from the above command don't match what's in the /etc/exports file, or just for the fun of it, try reloading them from the file:

$ exportfs -ra

  • If you verified that you have permissions, unmount the NFS, restart NFS services on the NFS server, then remount again. Restarting commands are:

$ su -

$ service portmap restart

$ service nfs restart

Backing up the NFS contents to a tape drive

See how to do this here: TSMBackupSystem.

Setting Up the RAID as a NFS

We are going to set up the RAID as an NFS file server as follows.

Add the following to your /etc/hosts.deny on the RAID:

portmap: ALL
lockd: ALL
mountd: ALL
rquotad: ALL
statd: ALL

Add the following to your /etc/hosts.allow on the RAID:

portmap: list
lockd: list
mountd: list
rquotad: list
statd: list

where "list" is a comma-delimited list of IP addresses which are allowed to mount the RAID. NB: I would recommend against allowing your gateway/router/firewall/intrusion detection node access to the NFS... just in case. So don't put it on these lists.

Add the following to /etc/exports:


If portmap, rpc.nfsd, and rpc.mountd are not running, you should launch them as follows:

$ /sbin/service portmap start

$ /usr/sbin/rpc.nfsd

$ /usr/sbin/rpc.mountd

If they are already running (grep the process list produced by ps aux | grep {portmap,nfsd,mountd} to see if they are running), you need to make them reload the changes you just made to /etc/exports, by doing:

$ exportfs -ra

which, by the way, will also inform you of any syntax problems with /etc/exports and is therefore a useful command in itself. You can also use:

$ exportfs -v

to find out what the currently loaded settings are, i.e. what the NFS service thinks it's supposed to do.

TODO: write up how to make the daemons launch at startup!

TODO: what's the deal with these:

lockd (if necessary)

listed as useful in this guide: [[1]]. Do I need these, or what?

Your NFS server is now ready and running! Try to mount the NFS drive on your NFS clients as follows (you must do this as root):

$ mount -t nfs lorien:/home/ /mnt/nfs/

which will mount the directory /home/ on the NFS server lorien to the local directory /mnt/nfs/ (if you don't like this, you can mount it to any local directory, but all the cluster nodes currently have the NFS mounted on /mnt/nfs/). You have to make sure the directory /mnt/nfs/ exists before you do this.

To automate mounting and make it be done automatically at startup, add the following to /etc/fstab:


  lorien:/home/	 /mnt/nfs/	 nfs	 default	 0 0

You should actually replace lorien with a hardcoded IP address, just in case. Now, run:

$ mount -a

to mount all the filesystems in /etc/fstab, including the NFS you just added. From now on, the NFS should mount automatically at boot/init time.

TODO: soft versus hard mounting? (also copy this over to the Installing Cent OSOn Cluster page) - see [[2]]


-- Created by: Andrew Uzilov on 20 Jun 2006