GFF
GFF is a format for locating & describing genes and other
localized features associated with DNA, RNA and Protein sequences.
The specifications for GFF format can be found here:
Note that GFF3 is backward-compatible with version 2, but introduces more structure
for (among other things) hierarchical grouping of features.
GFF fields
GFF is relatively simple, containing just 9 fields per "feature" (record).
Fields are tab-delimited and features are newline-delimited.
The 9 fields are
NAME SOURCE TYPE START END SCORE STRAND FRAME GROUP
These can be classified as
- the bare minimum needed to represent precise feature co-ordinates:
-
NAME - the reference sequence: chromosome, contig, supercontig/scaffold, or other sequence identifier
- this is usually not the name of our feature, but rather the sequence to which our feature is relative (only large "genomic ruler" features like chromosomes, scaffolds, etc. are their own reference sequence)
-
START, END - 1-based indices of start and end of our feature relative to the reference sequence (START <= END must be true regardless of feature orientation)
- fields summarizing the output of programs that predict annotation features:
-
TYPE - feature type (GFF3 uses the sequence ontology to restrict this field)
-
SOURCE - name of originating sensor program
-
SCORE - the score assigned to the feature by the sensor program
- genefinder-centric fields:
-
STRAND - orientation of feature relative to the reference sequence
-
FRAME - translational reading frame; also called PHASE
- a final, catch-all field:
-
GROUP - as of GFF3, this is a semicolon-separated "tag=value" attribute list, with various well-defined tags and values such as "ID" or "Parent"
For more info, see the
GFF3 spec or the
GffTools page.
Example applications that use GFF format
An example of a server that generates GFF is Expasy (e.g.
P0A7B8.gff).
Examples of clients that use this format are
Jalview, the multiple sequence alignment editor & viewer,
and
STRAP, the structural alignment tool (
example output).