Stockholm Format

From Biowiki
(Redirected from StockholmFormat)
Jump to: navigation, search

Stockholm Format

Stockholm Format is a flatfile format for databases of annotated multiple sequence alignments.

It is the format used e.g. by the Pfam and Rfam databases, containing alignments of protein and RNA families, respectively.

Erik Sonnhammer's group's page has the format spec (Alex Coventry - 28 Feb 2005).

Here is our mirror of that page (Ian Holmes).

Stockholm shows by-column alignment annotations, such as RNA secondary structure, in a compact and (if appropriately indented) human-readable way. For example (pairwise alignment of purine riboswitches):

# STOCKHOLM 1.0
#=GC SS_cons		 .................<<<<<<<<...<<<<<<<........>>>>>>>..
AP001509.1			UUAAUCGAGCUCAACACUCUUCGUAUAUCCUC-UCAAUAUGG-GAUGAGGGU
#=GR AP001509.1 SS -----------------<<<<<<<<---..<<-<<-------->>->>..--
AE007476.1			AAAAUUGAAUAUCGUUUUACUUGUUUAU-GUCGUGAAU-UGG-CACGA-CGU
#=GR AE007476.1 SS -----------------<<<<<<<<-----<<.<<-------->>.>>----

#=GC SS_cons		 ......<<<<<<<.......>>>>>>>..>>>>>>>>...............
AP001509.1			CUCUAC-AGGUA-CCGUAAA-UACCUAGCUACGAAAAGAAUGCAGUUAAUGU
#=GR AP001509.1 SS -------<<<<<--------->>>>>--->>>>>>>>---------------
AE007476.1			UUCUACAAGGUG-CCGG-AA-CACCUAACAAUAAGUAAGUCAGCAGUGAGAU
#=GR AE007476.1 SS ------.<<<<<--------->>>>>.-->>>>>>>>---------------
//

Stockholm allows sequences to be split over multiple lines (as in the above example), though this is "discouraged" in the spec.

Stockholm Tools

See Stockholm Tools for a list of tools for working with Stockholm Format.

Also see the Bioperl Stockholm class.