Saps Modules

From Biowiki
Jump to: navigation, search

Sequence Analysis Pipeline System

Perl modules by Shengqiang Shu.

Hi, Ian.

My SAPS modules were intended to be open, but they are housed in BDGP cvs.
You can get them if you want: public-CVS/saps

I am in the middle of major improvements for new requirements in my current work place.
In a few days, I will have a tested version. Documentation so far is in the module itself.
Overall documentation/howto is on the todo list.

SAPS is a compute job tracking and management system. In a nutshell, what SAPS is trying to do is:
multiple processes safe:
	 no job will interfere with other jobs.
	 it will work with any source database and any result database.
	 It requires puller (pulling data from external database as input, and I have a couple of pullers:
		  FASTA database,
		  sequence using sql statement (tested with mysql and postgres),
		  id as input using sql statement,
		  and these pullers also allow you to specify how many these things as one job input
	 It requires pusher: put results into result database.
	 lots of knobs and switches in place, user have to specify which ones are going to be used using controlled
	 vocabulary in configuration time.
	 computes can be done locally or on a farm (right now only PBS farm is supported, for job submission).
	 compute results can be stored in saps database (only support postgres) or on filesystem (support whole
	 spectrum: all in database to all on filesystem).
	 single parent or multiple parents dependency are supported so computes can be chained together
	 in addition to that job itself has its own dependency (states): run->parse->store (parse can be skipped)
	 and to BLAST database dependency handling (build the database from external db and index if necessary).
	 (multiple parent dependency is in testing stage).
hope this is clear.

saps.tar.gz: Shengqiang Shu's SAPS modules (CVS snapshot, 3/21/2007)