Gff Format

From Biowiki
Jump to: navigation, search

GFF

GFF is a format for locating & describing genes and other localized features associated with DNA, RNA and Protein sequences. The specifications for GFF format can be found here:

Note that GFF3 is backward-compatible with version 2, but introduces more structure for (among other things) hierarchical grouping of features.

GFF fields

GFF is relatively simple, containing just 9 fields per "feature" (record). Fields are tab-delimited and features are newline-delimited. The 9 fields are

NAME SOURCE TYPE START END SCORE STRAND FRAME GROUP

These can be classified as

  • the bare minimum needed to represent precise feature co-ordinates:
    • NAME - the reference sequence: chromosome, contig, supercontig/scaffold, or other sequence identifier
      • this is usually not the name of our feature, but rather the sequence to which our feature is relative (only large "genomic ruler" features like chromosomes, scaffolds, etc. are their own reference sequence)
    • START, END - 1-based indices of start and end of our feature relative to the reference sequence (START <= END must be true regardless of feature orientation)
  • fields summarizing the output of programs that predict annotation features:
    • TYPE - feature type (GFF3 uses the sequence ontology to restrict this field)
    • SOURCE - name of originating sensor program
    • SCORE - the score assigned to the feature by the sensor program
  • genefinder-centric fields:
    • STRAND - orientation of feature relative to the reference sequence
    • FRAME - translational reading frame; also called PHASE
  • a final, catch-all field:
    • GROUP - as of GFF3, this is a semicolon-separated "tag=value" attribute list, with various well-defined tags and values such as "ID" or "Parent"

For more info, see the GFF3 spec or the Gff Tools page.

Example applications that use GFF format

An example of a server that generates GFF is Expasy (e.g. P0A7B8.gff).

Examples of clients that use this format are Jalview, the multiple sequence alignment editor & viewer,

and STRAP, the structural alignment tool (example output).