Fly Mavid Windows

From Biowiki
Jump to: navigation, search

---

Window generation procedure

This explains how the windows in /nfs/data/genome/fly/windows/ were created for input to xrate and RNAz (see Twelve Fly Scan for how the windows were used).

See the makefile (/nfs/data/genome/fly/Makefile.windows) for exact details.

Windows of 120 columns are created by stepping 40 columns at a time through Mavid alignment segments (/nfs/data/genome/fly/align/12fly/mavid/)

WARNING!

The make-windows.pl incorrectly computes minus strand Mercator coordinates output to the Stockholm files. Do not trust any minus strand coordinates in the Stockholm files! Dmel is correct because it is the Mavid/Mercator reference sequence and thus always has the plus strand. However, there is no guarantee this works for other species.

In general, it is much better to work in the alignment coordinate system, anyway. That is, use Mavid/Mercator segments and alignment columns as your coordinates for windows and hits. Then, when you filter it down to good candidate hits, convert those to Mercator, CAF1, or Gen Bank coordinates using mercator-perl/featurevole.pl (see Mercator Perl). A file called mm-align-coords.gff will be generated by Makefile.windows that gives you alignment coordinates for each window.

Making windows for xrate

It's faster to do these things directly on the RAID (avoid NFS overhead).

ssh lorien

cd /home/data/genome/fly/

Make the Stockholm-format windows (warning: this step takes a while, maybe even DAYS).

make -f Makefile.windows all-windows

The above will also produce the files:

  • windows/DroMel_CAF1.gff
    • contains Drosophila melanogaster Mercator/CAF1/GenBank coordinates

_Dmel_ coords are the same in all 3 systems, which is *not true for most species because their assemblies are not finished

  • mm-align-coords.gff
    • gives you Mavid/Mercator alignment coordinates for each window
      • handy for piping to mercator-perl/alignmonkey.pl to extract the alignments (see Mercator Perl)

Drop columns with 100% gaps (warning: this will also take a long time).

make -f Makefile.windows drop-gappy

Done! Submit them to xrate.

Making windows for RNAz

You must have made all the xrate windows prior to this, as described above.

Convert the window to ClustalW format for input to RNAz.

make -f Makefile.windows convert-to-aln

Get rid of the useless reverse complement (r-suffix) files (since RNAz can score both forward and rev strand) and optimize alignments for RNAz using the Perl script they provided.

make -f Makefile.windows prepare-for-rnaz

Now in theory, the above run might have, once again, created alignments with 100% columns, so we would need to drop those again. I wrote a program (/nfs/src/perl/drop-gappy-cols-aln.pl) for this, but applying it showed that no columns get dropped. So, I'm not bothering to add a makefile rule for it.

Other notes

From /nfs/data/genome/fly/windows/NOTES.TXT (which I am removing, because everything is here now):

[AVU 10/12/2006]

Each segment subdir has 7 files for each window, with these extentions:

.stk.gappy x 2 - original Stockholm window file created using 'make-window.pl', i.e. gappy columns NOT dropped, and its rev comp .stk x 2 - the above file, but columns with 100% gaps dropped, and its rev comp (you should run the xrate screen on THESE)

.aln.allseqs - the ClustalW file generated from the .stk file above (the original, NOT the rev comp), verbatim .aln.gappy - the above file, optimized for RNAz using 'rnazSelectSeqs.pl' with default options (this may result in gappy columns!) .aln - the above file, except columns with 100% gaps dropped (you should run the RNAz screen on THIS)

The files above were generated IN THE ORDER LISTED, and each file is generated from the one before it in the list.

Reverse complements have "r" in filename. They are reverse complements of the sequence observed in the Mavid/Mercator alignments.

Note that because RNAz can score both the original window and its reverse complement (the -b flag), there are no window files for reverse complements. Just run RNAz with the -b flag and it will take the rev comp for you.

Note that the .stk.gappy, .aln.allseqs, and .aln.gappy files are pretty useless and will probably be deleted, although I'm holding on to them for now... just in case.

The 'make-windows.pl' run generated a MASSIVE (2.4GB) logfile that explicitly tells you what happened; it is ../windows.etc/make-windows.log

[AVU 01/08/2007]

The intermediate files (those ending with .stk.gappy, .aln.allseqs, and .aln.gappy) have been moved to a separate dir (../windows.int/) for zipping, because they are not necessary for the screen and are taking up a lot of space.

NOTE: Actually the intermediate files have now been deleted, and the make-windows.pl logfile was compressed.

---

-- Created by: Andrew Uzilov on 20 Mar 2007