Window Eater

From Biowiki
Jump to: navigation, search

windoweater.pl

A layer between windowlicker.pl and xrate in our pipeline; throws out windows below some threshold of conservation.

Located in: /nfs/projects/pipeline/perl/windoweater.pl

windoweater.pl is part of the pipeline CVS project; see http://biowiki.org/HowToRunPipeline#Check_out_the_project

the Point

See also: How To Run Pipeline

We want to filter windows based on conservation. After some discussion, we decided that a suitably generic way to implement this would be to:

  • annotate each alignment segment with a conservation annotation (e.g. #=GC CONS)
  • have windowlicker.pl invoke a "fake xrate" process instead of the real xrate
  • the "fake xrate" process (i.e. windoweater.pl) would now receive windows made by windowlicker.pl and make the decision whether they are conserved enough or not, then pass only the conserved ones to xrate, reducing runtime
  • windows that aren't passed to xrate get rapidly annotated as "all intergenic" by windoweater.pl
  • as a result, the windowlicker.pl GFF output will not include a ncRNA annotation for non-conserved windows, as there will be no non-intergenic annotation in them

how to use

You can get a usage message (describing all the options) via:

./windoweater.pl --help

Alternately, use perldoc (looks prettier):

perldoc windoweater.pl

Here is an example of how to use the program in the Xrate Pipeline:

/nfs/src/dart/perl/windowlicker.pl
 -w 100
 -d 50
 -gff segment.gff
 -x /nfs/projects/pipeline/perl/windoweater.pl
 segment.annot.stock
 --
 --eater_verbose
 --eater_cutoff 5
 --eater_mincols 0.65
 --eater_xrate /nfs/src/dart/bin/xrate
 -e /nfs/src/dart/grammars/jukescantor.eg
 -g /nfs/projects/caf1screen/grammars/ncRnaDualStrand_v15.eg 

Important points about the above example:

  • we invoke windoweater.pl instead of xrate with the -x option to windowlicker.pl
  • the options after -- normally all go to xrate; however, in this case, options starting with --eater_ will go to windoweater.pl, and the remaining options after -- will go to xrate
  • the path to the xrate binary is now specified using the --eater_xrate arg
  • the numbers (i.e. window/step size, conservation cutoff) are all made up and probably aren't what you want in the real pipeline

The above is based on the segment.xrate rule in the Avu Tests CVS project.

---

-- Created by: Andrew Uzilov on 18 Dec 2007