see also:
ClusterNFSBenchmarks
The Network File System (NFS) for the BabylonCluster
The NFS server is currently
lorien, our
ClusterRAID node.
Its
/home/ directory should be mounted on
/mnt/nfs/ of all the lab machines and the cluster
SunGridEngine nodes, for consistency. I have recently started a trend (to save typing) of symlinking
/nfs/ to
/mnt/nfs/, which is now in effect on all the cluster nodes.
IH notes (3/14/2007):
The directory structure on the NFS is a bit of a mess.
Some projects are scattered over multiple locations (e.g. try ls -l ~yam/fly).
At the top level, links to user directories are mixed up with project and source code directories (ls -l /nfs).
The filesystem is a spiderweb of symlinks; reorganizing things would probably break too much to be worth it at this point, but we need to isolate the damage.
For example, is there a reason why the user dirs are all linked from /nfs to /nfs/users, and will this need to happen for future users? (This seems particularly nasty if it continues.)
Also, I think that the references to giles (above) can now be deleted.
(as, presumably, can the symlinks to /mnt/clusternfs)
AU adds: giles reference removed
Regarding the rest... waay back in the day, all user home directories used to be in
lorien:/home/ (which is mounted on /nfs/). Then, we decided that
lorien:/home/ should house "data", "projects", and "src"... so we
moved all the user home directories to lorien:/home/users/, but you
symlinked them to preserve backwards compatibility. Likewise, notice
that things like "dart" and "db", which are now in "src" and "data"
subdirs (but used to be in /nfs/), are still symlinked - once again,
for backwards compatibility with the Old Way.
Yeah, I agree those symlinks are a mess, and I'd like to pull them.
The directory /nfs/ should contain ONLY the following:
/nfs/data/
/nfs/projects/
/nfs/src/
/nfs/tmp/
/nfs/users/
and that's it. If you give the word, I will make it so (and announce
to everyone in case anyone is still relying on this backwards
compatibility - I know I'm not).
And yeah, users that have been added since we instituted this "new"
dir structure (e.g. avinash, mitch) aren't using those symlinks - they
are just in /nfs/users/ and that's it. So to answer your question...
this symlinking will not happen for future users, and hasn't happened
for quite some time.
IH mumbles:
doh! my fault then. I don't mind software directories (like "dart") being symlinked,
or relatively stable data directories, but users & analysis projects shouldn't be there.
Let's talk about this... we should delete the extra symlinks soon.
(Followup: DeprecatedSymlinks)
This doesn't totally resolve all the above issues -- the data directories are still a mess -- but the clutter in /nfs is the most important thing.
Misc. notes
How to tell what NFS version(s) and protocol(s) are available on some NFS server:
/usr/sbin/rpcinfo -p <host name or IP address> | grep nfs # look in the "vers" and "proto" columns
Your client NFS options are given by:
mount
If some option is unspecified, it is the default (default client settings on Debian are: NFS version 2, UDP, rsize = wsize = 1024, timeo = 7; on Mac OS X Panther they are: NFS version 3, async, I don't know the others).
On the NFS server, you can get some useful stuff with the
showmount command.
Mounting
OS X Automounting
I'm sure this has already been figured out, but here's the 'correct' way to do it under OS X (leopard and higher, older version use some
NeXT
legacy application):
- open /Applications/Utilities/Directory Utility
- >Show Advanced Settings
- Click on 'Mounts' in toolbar
- unlock to make changes (requires root password)
- Add the nfs mount (URL: nfs://lorien/nfs, Mount Directory: /nfs)
Now nfs will automount on boot.
-LEB
NFS v3
$ su -
$ mount -t nfs lorien:/home/ /mnt/nfs/ (Linux boxes)
$ mount_nfs lorien:/home/ /mnt/nfs/ (Mac boxes)
TODO: write how to add to
/etc/fstab for automatically mounting on Linux boxes (via
mount -a or at startup).
TODO: add NFS optimization options... whenever I find which ones actually work.
NFS v4
Make sure that the NFS server has the
fsid=0 option in
/etc/exports, e.g.:
/home/ 127.0.0.1(rw,sync,insecure,fsid=0)
If you have just added this option, make sure to re-export by running:
exportfs -ra
The clients get mounted by:
$ mount -t nfs lorien:/ /mnt/nfs/ (Linux boxes)
Note that we
do not specify the
/home/ directory on
lorien explicitly, but just give it the root
/.
The client will figure out what to mount magically because of the
fsid=0 option we added in the server's
/etc/exports line.
Unmounting
$ su -
$ umount /mnt/nfs/
or, failing that, like this:
$ umount -l /mnt/nfs/ (lazy unmount for Linux boxes, circumvents the "busy filesystem" error)
$ umount -f /mnt/nfs/ (force unmount for Mac OS X because they have no lazy unmount, circumvents... well, pretty much everything)
Troubleshooting
If you're having NFS troubles (the most common one is that your machine hangs anytime you do an
ls on an NFS directory):
- Verify you can actually reach the NFS server:
$ ping lorien
- If you can't, this is a network problem that has nothing to do with the NFS. If you can, try unmounting and remounting the NFS as described above.
- If that fails, verify you have permission to mount the NFS by checking to make sure your machine's or your network's IP is in /etc/hosts.allow (relevant rules/daemons/services are: portmap, lockd, mountd, rquotad, statd) and /etc/exports on the NFS server. Also see /var/log/messages to see what errors are logged when you try to mount (running tail -f /var/log/messages on the NFS server in one terminal as you try a mount in another terminal is useful for watching a "live" message log).
- Check to make sure permissions in /etc/exports are loaded correctly:
$ su -
$ exportfs
- If the permissions from the above command don't match what's in the /etc/exports file, or just for the fun of it, try reloading them from the file:
$ exportfs -ra
- If you verified that you have permissions, unmount the NFS, restart NFS services on the NFS server, then remount again. Restarting commands are:
$ su -
$ service portmap restart
$ service nfs restart
Backing up the NFS contents to a tape drive
See how to do this here:
TSMBackupSystem.
Setting Up the RAID as a NFS
We are going to set up the RAID as an NFS file server as follows.
Add the following to your
/etc/hosts.deny on the RAID:
portmap: ALL
lockd: ALL
mountd: ALL
rquotad: ALL
statd: ALL
Add the following to your
/etc/hosts.allow on the RAID:
portmap: list
lockd: list
mountd: list
rquotad: list
statd: list
where "list" is a comma-delimited list of IP addresses which are allowed to mount the RAID.
NB: I would recommend against allowing your gateway/router/firewall/intrusion detection node access to the NFS... just in case. So don't put it on these lists.
Add the following to
/etc/exports:
/home/ 192.168.0.0/255.255.255.0(rw,sync)
If
portmap,
rpc.nfsd, and
rpc.mountd are not running, you should launch them as follows:
$ /sbin/service portmap start
$ /usr/sbin/rpc.nfsd
$ /usr/sbin/rpc.mountd
If they are already running (
grep the process list produced by
ps aux | grep {portmap,nfsd,mountd} to see if they are running), you need to make them reload the changes you just made to
/etc/exports, by doing:
$ exportfs -ra
which, by the way, will also inform you of any syntax problems with
/etc/exports and is therefore a useful command in itself. You can also use:
$ exportfs -v
to find out what the currently loaded settings are, i.e. what the NFS service thinks it's supposed to do.
TODO: write up how to make the daemons launch at startup!
TODO: what's the deal with these:
statd
lockd (if necessary)
rquotad
listed as useful in this guide:
http://www.faqs.org/docs/Linux-HOWTO/NFS-HOWTO.html. Do I need these, or what?
Your NFS server is now ready and running! Try to mount the NFS drive on your NFS clients as follows (you must do this as root):
$ mount -t nfs lorien:/home/ /mnt/nfs/
which will mount the directory
/home/ on the NFS server
lorien to the local directory
/mnt/nfs/ (if you don't like this, you can mount it to any local directory, but all the cluster nodes currently have the NFS mounted on
/mnt/nfs/). You have to make sure the directory
/mnt/nfs/ exists before you do this.
To automate mounting and make it be done automatically at startup, add the following to
/etc/fstab:
TODO: THESE NEED TO BE TWEAKED TO SPEED UP THE NFS!
lorien:/home/ /mnt/nfs/ nfs default 0 0
You should actually replace
lorien with a hardcoded IP address, just in case. Now, run:
$ mount -a
to mount all the filesystems in
/etc/fstab, including the NFS you just added. From now on, the NFS should mount automatically at boot/init time.
TODO: soft versus hard mounting? (also copy this over to the
InstallingCentOSOnCluster page) - see
http://www.faqs.org/docs/Linux-HOWTO/NFS-HOWTO.html
-- Created by:
AndrewUzilov on 20 Jun 2006