Tags:
create new tag
, view all tags


HolmesLab computing cluster (aka babylon)

See also

SerialConcentrator

HowToAddNewClusterUser

CentosAlternatives

ClusterMaintenanceLog

Software

See ClusterSoftware.

Databases

See ClusterDatabases.

Compiling stuff on the cluster

See ClusterInstallPolicy.

Cluster-related sub-projects

TODO: update this!

Specifications

  • 23 nodes for general-purpose computation (dual-CPU 64-bit AMD Opterons, 2.2GHz, 2GB RAM, 80GB HDD)
  • 1 node for high-memory computation (quad-CPU 64-bit ADM Opteron, 2.2GHz, 32GB RAM, 300GB HDD)
  • 1 node for RAID (dual-CPU 64-bit AMD Opterons, 2.0GHz, 2GB RAM, 2.4TB 8-disc RAID-5 - more details on ClusterRAID)
  • 2 lightweight nodes for routing, intrusion detection, job control, etc. (single-CPU 64-bit AMD Opteron, 2.2GHz, 512MB RAM, 80GB HDD)
  • 1 webserver for AJAX GBrowse demo and development
  • 64-bit CentOS 4.2/4.3/4.4 (kernel 2.6.x) operating system for all nodes
  • 2 gigabit switches
  • Aten KVM-over-IP

Facilities

GBrowse

SunGridEngine, BioPerl libraries, GD, MySQL, etc.

Genome pipeline

DART, Rfam, PANDIT, 12fly ...

BioPerl

  • v1.5.1rc3 on old compute nodes (ivanova, kosh, etc.)
  • none installed yet on new compute nodes (bester, etc.)
  • v1.5.2_102 on sheridan and lorien
  • v1.5.2_101 on sinclair (genome.biowiki.org)

Node-specific notes

lorien

[AVU 09-18-2006]

I tried to install gcc 4.1.1 with Java AWT support... which failed because
there were problems with gtk+... so I tried to install that from source,
which failed because there were problems with glib, atk, cairo, and pango...
and although I was able to resolve the first 2 and install cairo without X
support (or at least I *think* that's what I did... ran it with --disable-xlib
option, it wouldn't work otherwise), pango killed me (said -lX11 could not be
found because X11 was incompatible - same problem as with cairo, except not as
easily avoidable... might be solvable by digging deeper into X11 dependencies,
but I give up at this point, graphics stuff is nasty stuff).

So in the end, as of today, the following are installed from source and up to
date: atk, cairo (without xlib), glib, libpng, and binutils.

But, no AWT for Java.  Sorry...

refa

[AVU 2007-06-29]

This node locked up... hard. Power light is on, but networking dead, black screen, pushing the power/reset buttons had no effect (even CD tray wouldn't open). Had to reboot it by literally pulling the plug, no other choice left.

It came back up fine and started running SGE jobs right away, and you can log into it - everything looks fine from the shell. But the keyboard worked sporadicaly, the mouse did not work at all. Fiddling the KVM cable and rebooting both the node and the KVM switch brought everything back to normal.

This kind of hardware weirdness should be noted. This node might be headed for trouble and should be investigated more thoroughly by a real sys admin.

[AVU 2007-07-11]

Networking dead, KVM-over-IP shows black screen. Looks like this node is fubared again, will reboot it soon. However, this is the second offense.

(Back from the datacenter...) Yep, same problem as last time - had to pull the plug to restart. Need to call FineTec on this one.

kosh

[AVU 2007-07-05]

Failed today while running the memory-hungry featurevole.pl. Networking, etc. dead. Logging in with KVM-over-IP showed "I/O failure to sector N of hda" (or some such) looping over and over.

Rebooting the node fixed the problem, but this looks worrying. Is the hard drive going bad?

[JPL 2007-08-08]

kosh experienced a drive failure the previous night, symptoms noted by Andrew on further investigation, drive remounted read-only, with large parts of the filesystem already corrupted brought in binaries from other systems to do diagnostics, waiting to rebuild/reboot after 08-15 system now refuses to initiate ssh identification, and remote access via KVM is not working

dead

[Mitch 2011-03-18]

  • draal (drive failure)
  • byron (drive failure)
  • londo (won't start)
  • vir (won't start)

Trivia

The nodes in the cluster are all named after characters from the science fiction television show Babylon 5.

Topic revision: r78 - 2011-03-18 - MitchSkinner
 

This site is powered by the TWiki collaboration platformCopyright © 2008-2014 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
TWiki Appliance - Powered by TurnKey Linux