TSMBackup System

From Biowiki
Jump to: navigation, search

---

IBM Tivoli Storage Manager (Version 5, Release 3, Level 4.0)

This is what we use to back up the contents of the RAID/NFS node on the Babylon Cluster. We only care about the client side (us), since the TSM server that actually stores our data is administered by UC Backup. Talk to our technical contact at UC Backup if want to make any server-side changes.

TODO: there are still some things marked todo, but they are not urgent.

0. How does it work? (a general overview)

You run the TSM client on the machine containing filesystems you want to back up. This client establishes and keeps open a TCP/IP connection to the TSM server (maintained by UC Backup). The server initiates the backup at a designated daily time (currently 9:30PM PST) and controls it from there. So, you must make sure that the connection to the server is up and stable right before the scheduled backup time, otherwise the backup cannot be made.

The server gets to say when the backup takes place. The client gets to say what gets backed up and what doesn't.

Our type of backup is called an incremental. This means only files that changed since last backup are updated.

A deleted file (aka inactive file) is maintained in backup for 90 days. For existing files (aka active files), the past 90 versions of the file are stored. (TODO: this is actually an oversimplification... need to update it with the full details of the retention policy.)

The backups are shipped off-site daily, so if the datacenter collapses, another copy of the data exists somewhere. Of course, the primary copy is kept on-site, so you can speedily restore a backup from a local copy.

There are many ways this can be set up, but this one is ours. Some say that you can use the client to control when the backups start. Some also say that you can configure multiple users with various privileges to use the client (whether via the command line or GUI) to back up and restore files. We don't do either (for the latter, only root has the right to administer the client).

But what is this "client" you speak of?

The TSM client can be accessed/configured/administered/whatever via either the command line interface (CLI), or a Java-based GUI on the backed-up machine, or a Web-based client GUI through a browser on some other machine that can reach your backed-up machine via a network. Personally, I hate the GUI, and this time I have good reasons for it.

The TSM client CLI can be started like this:

$ su - (make sure you put that minus there, because the root environment is necessary for this to work properly)

$ dsmc

You then enter a CLI session where you can execute commands; e.g., the following will print out some basic info about your session:

tsm> query session

To exit, type:

tsm> quit

If you just want to run a single command without entering the session, you can do it straight from your shell's command line, so the above could be accomplished like this:

$ su -

$ dsmc query session

which would cause the TSM client to launch, run query session, and exit.

So how does this client actually do the automatic, daily backup?

Good question. dsmc is just an interface to configure settings, check settings, etc. In order for the TSM client to actually contact the TSM server and exchange information about when the backup should begin, we must launch the TSM client scheduler, like this:

$ dsmc schedule

The client scheduler establishes a TCP/IP connection with the TSM server specified by your config file (which is /opt/tivoli/tsm/client/ba/bin/dsm.sys by default). As long as that connection is open at the time the server is supposed to start the backup, the backup can occur successfully (because the backup is initiated by the server). So if you background and nohup the client scheduler and let it hang around forever, it will run things for you.

The problem is that it is a memory hog. So, a more efficient way is to use the CAD, which is a "driver/wrapper process" so to speak, which can be started using:

$ cd /opt/tivoli/tsm/client/ba/bin/

$ dsmcad

You do not need to nohup it. It just hangs out, running indefinately. Making it start at boot time is a good idea.

Yes, we must cd into that directory first, because that's where dsm.sys is. See the weirdnesses section.

The CAD does two things: it enables you to use a Web client GUI (see the "Installation/Configuration" sections) and it automatically starts the client scheduler process at the right time. It basically works like this:

  • You launch the dsmcad executable.
  • dsmcad launches the client scheduler.
  • The client scheduler contacts the TSM server and finds out when the backup is supposed to start, then passes that information back to dsmcad. dsmcad now knows when the server wants to do the backup.
  • The client scheduler exits, so it is no longer a memory hog, but the lightweight dsmcad is still running.
  • dsmcad launches the client scheduler right before the backup is supposed to begin. After the backup is complete, dsmcad terminates the client scheduler. This step is rinsed and repeated. If the designated backup time is changed on the server, it will get updated in this step.

For CAD to work, your dsm.sys file must contain the line:

  managedservices	 webclient schedule

The first parameter to managedservices specifies that CAD manages the Web client, and the second specifies that CAD manages the client scheduler.

1. Resources, reference, and other notes

Our TSM RPMs and documentation are in: /mnt/nfs/src/tsm/ (see esp. manual.pdf).

IBM has a TSM support website here.

The IBM TSM client manual can be found in either /mnt/nfs/src/tsm/manual.pdf or (at least the section that applies to this) here.

Useful parts of the IBM TSM client manual

Page numbers below are electronic page numbers of the PDF (i.e. 1 through 543), not the page numbers printed on the page itself.

  • pg 15 - understanding dsm.sys syntax and reading those crazy syntax diagrams
  • pg 80 - configuring the Web client (here be monsters)
  • pg 81 - using "CAD managing the start and shutdown of the client scheduler" approach versus "keep the client scheduler running all the time and hog memory" approach
  • pg 87 - include-exclude list information and syntax
  • pg 187 - start of the command guide that you can use to decipher dsm.sys statements
  • pg 295 - managed services syntax (i.e. does CAD run the Web client, the client scheduler, or both)
  • pg 463 - restore command syntax for restoring your backups (see esp. examples on pg 495) - very important! (you want to know how to restore your backups too, don't you?)

Super-fun acronym(s)

CAD - the "Tivoli Storage Manager Client Acceptor" (see pg 495 of the manual). In ps aux output, looks something like this:

root	  29001  0.0  0.2 116868 5284 ?		 Sl	13:55	0:00 dsmcad

Why I hate the TSM client GUI (and GUIs in general)

There are two GUIs you can use instead of the CLI to configure the client, and both are Java-based. One you can run directly on the backed-up machine, the other you can use through a !JBrowse.JavaScript- and Java-enabled Web browser from some remote machine. You use your Web browser to connect to the backed-up machine via TCP/IP and HTTP, which runs a Java applet that gives you a nice point-and-click, intuitive GUI you can use to easily configure all components of the client. Right? WRONG.

The Web client GUI suffers from problems. I haven't used the native GUI, but I presume at least some of these problems persist on that, too. Maybe it's just because I'm running Firefox on a Mac, but... so what? Among the problems are:

  • Asks you for password, even if you specified passwordaccess generate in dsm.sys.
  • Has all kinds of GUI problems (items in lists can't get selected with mouse, dialogue boxed go wild and jump around, other unruly and unstable behavior that you do not want from an interface).
  • On the backed-up machine side (the HTTP server in this case), it leaks ports all over the place. lsof -i shows a lot of unnecessarily open connections, and after you terminate your Web client session, they just stick around!
  • Requires you to either be on the same subnet as your backed-up machine, or else to open holes in the firewall (such as for the port listening for HTTP connections - personally, I do not want any open ports like that unless I'm running a webserver, and only a webserver, on that box).
  • Requires !JBrowse.JavaScript and Java.

Does the CLI have any of these problems? No. It's nice and terse and clean, and you can scroll your terminal window up to see what you did a few seconds ago. It's text-only, stable, doesn't jump around when you click something or hang or do anything weird. You can do everything with the keyboard instead of moving from keyboard-to-mouse and back constantly. You don't have to punch holes in firewalls to use it, or put your machine on the VPN, or use X sessions just to do something that would take a fraction of the time with SSH and CLI. Why have additional complexity when you can just use the CLI? Why do people keep doing this, and even worse, doing it badly? What is more ubiquitious on all networked (non-Windows) machines across the world than a command line shell and an SSH client?

2. Installing, configuring, and launching the TSM client

Installation

Download from here.

Untar TSM534C_LINUX86.tar (it contains all the documentation you will need, including the installation/etc. manual, README_enu.htm) and install the RPMs:

$ tar xvf TSM534C_LINUX86.tar

$ rpm -i TIVsm-API.i386.rpm

$ rpm -i TIVsm-API64.i386.rpm

$ rpm -i TIVsm-BA.i386.rpm

The second RPM may give you trouble, e.g.:

[root@lorien tsm]# rpm -i TIVsm-API64.i386.rpm 
error: Failed dependencies:
		  libstdc++.so.5()(64bit) is needed by TIVsm-API64-5.3.4-0.x86_64
		  libstdc++.so.5(CXXABI_1.2)(64bit) is needed by TIVsm-API64-5.3.4-0.x86_64
		  libstdc++.so.5(GLIBCPP_3.2)(64bit) is needed by TIVsm-API64-5.3.4-0.x86_64
	 Suggested resolutions:
		  /home/buildcentos/CENTOS/en/4.0/x86_64/CentOS/RPMS/compat-libstdc++-33-3.2.3-47.3.x86_64.rpm

If that happens, download a copy of the missing RPM (compat-libstdc++-33-3.2.3-47.3.x86_64.rpm) from here or a more official site if you are more paranoid and install it:

$ rpm -i compat-libstdc++-33-3.2.3-47.3.x86_64.rpm

Stuff will by default get put into: /opt/tivoli/tsm/client/

Configuration

The essential TSM file configuration file that is the heart of the system is (by default) /opt/tivoli/tsm/client/ba/bin/dsm.sys. In that same directory, the file dsm.sys.smp provides a template on which you can base a bare-bones config file.

This is our dsm.sys file, so create one like it:

defaultserver	ucbackup

servername  ucbackup
  nodename			  babylon.biowiki.org
  passwordaccess	  generate

  commmethod			TCPip
  tcpport				<CENSORED>
  tcpserveraddress	<CENSORED>

  managedservices	 webclient schedule
  httpport			  <CENSORED>

  dirmc				  UNIX_QUARTER
  inclexcl			  /etc/tsm/backup-include-exclude.list

TODO: write an explanation of every line of the config file. Until then, see pg 187 through the end of the manual for reference.

Note the inclexcl statement. It points TSM to a list of directories/files we want to include/exclude in our backup. The list is currently this:

include.backup * UNIX_QUARTER
exclude.dir /dev
exclude.dir /media
exclude.dir /mnt
exclude.dir /opt/tivoli
exclude.dir /proc
exclude.dir /tmp
exclude.dir /var/log/tsm

Note there are no frontslashes at the ends of directory names (see the weirdnesses section). For more info on how to construct include/exclude lists, see "Include/Exclude" section below or read the manual.

This include/exclude list basically says: "back up everything on the machine using the server-specified UNIX_QUARTER data retention policy, except for all the directories and their contents specified by the exclude.dir statement" (which obviously includes virtual filesystems, the loopback mount, the TSM client and log directory because it gets changed during the backup process anyway, etc.).

Finally, create a blank dsm.opt file:

$ touch /opt/tivoli/tsm/client/ba/bin/dsm.opt

This is just to prevent a warning about dsm.opt not existing that otherwise prints every time you run dsmc. Ordinarily, dsm.opt would contain options if you have more than one backup server. We don't, so our dsm.opt is a blank placeholder.

Testing the client setup

Let's see if we can connect to the TSM server. Try:

$ dsmc query session

If everything is working as it should, you should just get some info about your server, your node, etc. That means you can connect to the TSM server.

You will also most likely get asked for a name and a password. Ask me what those are. Because of the passwordaccess generate option in dsm.sys, you will only need to enter the name and password once, after which a hash of it will get saved on the backed-up machine and you will never have to enter it again... ever (unless you wipe the hash). I am not entirely sure how this works, but it will prevent you from having to enter the password every time the client scheduler gets launched.

If you had a default password given to you by UC Backup, now is a good time to change it, using:

$ dsmc set password

Before embarking on a full backup, you should do some small trial backups and restores, just to test the waters a bit. You can back up a single file using:

$ dsmc incremental <file path> (local or global path)

To restore the file to original location:

$ dsmc restore <file path> (local or global path, will get translated to global for looking up the file in the backup set)

or restore to a new file:

$ dsmc restore <original file path> <new file path> (global or local paths, the first one will get translated to global for looking up the file in the backup set)

You can also do whole directories by using * to specify the "file space", as the manual calls it (akin to namespace, I guess):

$ dsmc incremental "<directory path>/*"

N.B.: enclose any path containing a wildcard in double quotes.

Then you can restore that directory using:

$ dsmc restore <directory path>/ -subdir=yes

N.B.: make sure to put a frontslash at the end of the directory name in this case (oddly enough, you do not want to do this in include/exclude lists - see the weirdness section); also, make sure to use the -subdir option, otherwise the restore will not be recursive and you will only get the top level of the filesystem tree.

If all these tests pass, you can perform your first, full backup. But before that, make sure your include/exclude list is being parsed correctly:

$ dsmc query inclexcl

If it is, do the full backup of everything on your machine:

$ dsmc incremental > backup.log 2>backup.err

which will take a while, as all of your files will be copied to the TSM server for the very first time. Above, backup.log will contain a very thorough record of what got backed up.

Launching the automatic daily backup

You can either launch the memory-hogging scheduler directly and keep it always running (I have not tried this and do not recommend it), by using:

$ nohup dsmc schedule > out.log 2>err.log &

Note that launching the client scheduler will not work if you have the managedservices schedule option set in dsm.sys.

The much better way (see this for an explanation of why) is to just run CAD to manage the client scheduler startup and exit. To do this, run:

$ cd /opt/tivoli/tsm/client/ba/bin/

$ dsmcad

Notet that this time, we need the managedservices schedule option set in dsm.sys for CAD to work.

Lastly, whatever you did, you want to make sure it gets launched at boot time. So, create a file in /etc/rc.d/init.d/, e.g. dsmcadstart (in our case), that looks something like this:

TODO: write the file CORRECTLY, with start/stop/restart params like all the other services!

TODO: figure out how get that to launch/terminate correctly at runlevels 3 through 5!

3. Using the TSM client

Administering the TSM client from the command line

Read this overview first.

Here are some general commands (read the manual for the rest). Remember, you can do all this stuff below by starting a dsmc CLI session and executing the commands in there, minus the "dsmc".

Make sure you invoke dsmc under su - and not su by itself! (yes, loading the environment of root is important). Yes, you can set up a system where various non-root users have various rights to administer the backup system. I have not done so since it seems excessive in a small lab such as ours.

Changing the password via the command line

$ dsmc set password

The TSM server doesn't accept non-alphanumeric password characters. No special chars for you! It's a feature.

Get some info about your TSM client's connectivity to the TSM server

$ dsmc query session

Get your include/exclude list, as it was parsed by the TSM client

$ dsmc query inclexcl

Perform a manual incremental backup

$ dsmc incremental

You can perform a backup at any time using this, on top of your automatic daily backup if you want. You can do this a thousand times a day, there's no limit, except that for any active file, only the past 90 copies will be stored.

Configuring the TSM client via the Web client GUI

First, read my rant on this is a bad idea, but feel free to do it anyway.

If you're on the lab subnet/VPN, you can configure the TSM client via the Web using a flaky Java GUI by pointing your browser (which must have !JBrowse.JavaScript enabled and Java installed) to:

http://lorien:<PORT SPECIFIED BY THE httpport SETTING IN dsm.sys>

See the manual for how to use it.

Of course, if lorien is no longer your backed-up machine, specify the name (or IP address) of the one that is.

N.B.: I've had problems in the past that the Web server doesn't listen on the port specified in dsm.sys, but some other port, which you can find by doing:

$ su -

$ lsof -i

and looking for the dsmcad process. Lately, the port specified by dsm.sys seems to be sticking, but if your Web browser can't connect, the first thing you need to do is make sure that dsmcad is running and listening on whatever port you're trying to connect to by using lsof -i.

Restoring files from backup (via the command line, of course)

ONE VERY IMPORTANT NOTE: you have to be logged in as the user whose files you are restoring! Otherwise TSM will fail to locate the backup. So make sure to su - USERNAME before running dsmc restore ... on USERNAME 's files.

In theory, you can restore to any machine, not just the backup host. I am not sure on the details of any of that (for the time being, see "client access" paragraph in the manual, pg 97).

To restore the file to original location:

$ dsmc restore <file path> (local or global path, will get translated to global for looking up the file in the backup set)

or restore to a new file:

$ dsmc restore <original file path> <new file path> (global or local paths, the first one will get translated to global for looking up the file in the backup set)

You can restore whole directories back to their original locations in the filesystem tree using:

$ dsmc restore <directory path>/ -subdir=yes

N.B.: make sure to put a frontslash at the end of the directory name in this case (oddly enough, you do not want to do this in include/exclude lists - see the weirdness section); also, make sure to use the -subdir option, otherwise the restore will not be recursive and you will only get the top level of the filesystem tree.

Or, you can restore them to some other directory (watch the frontslashes again, you will get an error if you fail to use them!):

$ dsmc restore <old directory path>/ <new directory path>/ -subdir=yes

You can restore stuff from an older backup (useful for getting deleted files or files from a few versions ago). For example, let's say we want to restore a directory and its contents from the most recent backup that was in place as of 9:00AM, July 25, 2006. We use:

$ dsmc restore -pitd=07/25/2006 -pitt=9:00 -subdir=yes <directory path>/

Include/exclude lists

These lists determine what objects (i.e. files or directories) get backed up. The parsing process for include/exclude lists is basically like this: imagine three sets: (1) the processing set, (2) the set of objects included in the backup, and (3) the set of objects excluded from the backup. The algorithm for parsing include/exclude lists is like this:

  • Put all objects on all reachable filesystems into the processing set.
  • Process all exclude.fs and exclude.dir statements first, regardless of their position in the list. I.e., for each file marked with exclude.fs and each directory (and its contents) marked with exclude.dir, take the objects they describe and move them from the processing set to the exclude set.
  • Process the rest of the list from bottom to top. For each statement (either include or exclude), move the objects described by that statement from the processing set and to either the include or the exclude set.
  • Move all remaining objects in the processing set to the include set (i.e., implicitly include everything that wasn't explicitly excluded).

The syntax of include/exclude statements is like this:

include <file or directory name> <data retention policy name>

Exclude statements don't need a policy name.

You can use wildcards. Note that:

  • include.backup is the same as include and exclude.backup is the same as exclude.

See our include/exclude list for a sample. See pg 87 in the manual for some examples and more explanation.

You can verify how the TSM client parses your include/exclude list by running:

$ dsmc query inclexcl

Logging of automatic daily backups

Whenever dsmcad (or I think even the client scheduler) are started, they write the following log files to whatever directory you started them from:

  • dsmwebcl.log (Web client log, useless unless you're using the Web client to administer TSM)
  • dsmsched.log (this is a massive one - it contains the complete record of every single object that was backed up, not backed up, expired, updated, etc... look in this one to get a trace of what happened during your automatic backup)
  • dsmerror.log (the error log; in a perfect world, this log would be empty)

Of course you can change where stuff gets logged by specifying that in dsm.sys. Otherwise, whichever directory you are in whenever you invoke dsmcad, the logs will go there.

4. Weirdnesses

There is some general strangeness to the TSM system that you must heed:

  • Always enclose any path containing a wildcard in double quotes, whether on the command line as parameters to dsmc or in an include/exclude list.
  • You should not terminate your directory names in an include/exclude list with frontslashes! This actually does cause a statement to not do what you think it will! So don't use exclude.dir /back/up/dir/ (this will not exclude /back/up/dir/, or its contents), but use /back/up/dir. Oddly enough, when restoring directories using dsmc, you do want to put frontslashes at the end.
  • You should be in the same directory as the dsm.sys file when launching dsmcad... I think. There has been some strange things surrounding this. That's why everywhere here I say to cd into the directory with dsm.sys first.
  • After you change anything in dsm.sys, restart everything. Make sure (ps aux and lsof -i) that no processed marked "dsm" are running anywhere or listening on anything. Sometimes they stick around and fail to pick up on changes to dsm.sys.

---

-- Created by: Andrew Uzilov on 19 May 2006