GFF is a format for locating & describing genes and other
localized features associated with DNA, RNA and Protein sequences.
The specifications for GFF format can be found here:
Note that GFF3 is backward-compatible with version 2, but introduces more structure
for (among other things) hierarchical grouping of features.
GFF is relatively simple, containing just 9 fields per "feature" (record).
Fields are tab-delimited and features are newline-delimited.
The 9 fields are
NAME SOURCE TYPE START END SCORE STRAND FRAME GROUP
These can be classified as
- the bare minimum needed to represent precise feature co-ordinates:
NAME - the reference sequence: chromosome, contig, supercontig/scaffold, or other sequence identifier
- this is usually not the name of our feature, but rather the sequence to which our feature is relative (only large "genomic ruler" features like chromosomes, scaffolds, etc. are their own reference sequence)
END - 1-based indices of start and end of our feature relative to the reference sequence (
END must be true regardless of feature orientation)
- fields summarizing the output of programs that predict annotation features:
TYPE - feature type (GFF3 uses the sequence ontology to restrict this field)
SOURCE - name of originating sensor program
SCORE - the score assigned to the feature by the sensor program
- genefinder-centric fields:
STRAND - orientation of feature relative to the reference sequence
FRAME - translational reading frame; also called
- a final, catch-all field:
GROUP - as of GFF3, this is a semicolon-separated "tag=value" attribute list, with various well-defined tags and values such as "ID" or "Parent"
For more info, see the GFF3 spec
or the GffTools
Example applications that use GFF format
An example of a server that generates GFF is Expasy (e.g. P0A7B8.gff
Examples of clients that use this format are
, the multiple sequence alignment editor & viewer,
, the structural alignment tool (example output