Gff Format
From Biowiki
Revision as of 23:42, 1 January 2017 by Move page script (talk | contribs) (Move page script moved page GffFormat to Gff Format: Rename from TWiki to MediaWiki style)
GFF
GFF is a format for locating & describing genes and other localized features associated with DNA, RNA and Protein sequences. The specifications for GFF format can be found here:
- GFF version 3, the latest version announced Nov 2005 (link is to the Sequence Ontology SourceForge site)
- GFF version 2 at the Sanger Institute's GFF site.
GFF fields
GFF is relatively simple, containing just 9 fields per "feature" (record). Fields are tab-delimited and features are newline-delimited. The 9 fields are
NAME SOURCE TYPE START END SCORE STRAND FRAME GROUP
These can be classified as
- the bare minimum needed to represent precise feature co-ordinates:
- NAME - the reference sequence: chromosome, contig, supercontig/scaffold, or other sequence identifier
- this is usually not the name of our feature, but rather the sequence to which our feature is relative (only large "genomic ruler" features like chromosomes, scaffolds, etc. are their own reference sequence)
- START, END - 1-based indices of start and end of our feature relative to the reference sequence (START <= END must be true regardless of feature orientation)
- NAME - the reference sequence: chromosome, contig, supercontig/scaffold, or other sequence identifier
- fields summarizing the output of programs that predict annotation features:
- TYPE - feature type (GFF3 uses the sequence ontology to restrict this field)
- SOURCE - name of originating sensor program
- SCORE - the score assigned to the feature by the sensor program
- genefinder-centric fields:
- STRAND - orientation of feature relative to the reference sequence
- FRAME - translational reading frame; also called PHASE
- a final, catch-all field:
- GROUP - as of GFF3, this is a semicolon-separated "tag=value" attribute list, with various well-defined tags and values such as "ID" or "Parent"
For more info, see the GFF3 spec or the Gff Tools page.
Example applications that use GFF format
An example of a server that generates GFF is Expasy (e.g. P0A7B8.gff).
Examples of clients that use this format are Jalview, the multiple sequence alignment editor & viewer,
and STRAP, the structural alignment tool (example output).