|-- MarkDewitt - 10 Dec 2009
GFF processing from GenBank files
Other members of team OMGBrowse are writing scripts that can produce genome browsers directly from genbank accession numbers, and to manage them through a more user-friendly front end than JBrowse itself. My contributions to this project mostly concerned learning how to use JBrowse and to help write perl scripts that process GenBank files into GFF files that are easily readable by the JBrowse installation scripts.
The perlscripts attached below help with this automation process.
How to use it:
0. Install JBrowse, if you haven't already.
1. Get a GenBank file from GenBank, multiple files as one big file are acceptable. For example, I looked up "dengue" on NCBI's genome database, and selected GenBank output to file at the top of the window. This creates a file with four GenBank entries, one for each serotype.
2. Copy the perlscripts attached below to the bin directory in your JBrowse folder.
3. Run the gffproc.pl program:
gffproc.pl -o [outputdirectory] [inputfile]
This takes a GenBank file from the input path, creates a diectory called outputdirectory, and puts the GFFS in that directory. The script takes GFFs from the BioPerl script and splits them up by feature (the third column, which the BioPerl script writes using SOBA). So if the GenBank file contains 10 annotations (e.g. mRNA, CDS, and mature peptide annotations), each is written to a separate GFF file.
4. The GFF files are ready to be fed into bin/flatfile-to-json.pl.