StockholmFormat is a flatfile format for databases of annotated multiple sequence alignments.
It is the format used e.g. by
the
Pfam
and
Rfam
databases, containing alignments of protein and RNA families,
respectively.
Erik Sonnhammer's group's page has the format spec
(
AlexCoventry - 28 Feb 2005).
Here is
our mirror of that page (
IanHolmes).
Stockholm shows by-column alignment annotations, such as RNA secondary structure, in a compact
and (if appropriately indented) human-readable way.
For example (pairwise alignment of purine riboswitches):
# STOCKHOLM 1.0
#=GC SS_cons .................<<<<<<<<...<<<<<<<........>>>>>>>..
AP001509.1 UUAAUCGAGCUCAACACUCUUCGUAUAUCCUC-UCAAUAUGG-GAUGAGGGU
#=GR AP001509.1 SS -----------------<<<<<<<<---..<<-<<-------->>->>..--
AE007476.1 AAAAUUGAAUAUCGUUUUACUUGUUUAU-GUCGUGAAU-UGG-CACGA-CGU
#=GR AE007476.1 SS -----------------<<<<<<<<-----<<.<<-------->>.>>----
#=GC SS_cons ......<<<<<<<.......>>>>>>>..>>>>>>>>...............
AP001509.1 CUCUAC-AGGUA-CCGUAAA-UACCUAGCUACGAAAAGAAUGCAGUUAAUGU
#=GR AP001509.1 SS -------<<<<<--------->>>>>--->>>>>>>>---------------
AE007476.1 UUCUACAAGGUG-CCGG-AA-CACCUAACAAUAAGUAAGUCAGCAGUGAGAU
#=GR AE007476.1 SS ------.<<<<<--------->>>>>.-->>>>>>>>---------------
//
Stockholm allows sequences to be split over multiple lines (as in the above example),
though this is "discouraged" in the spec.
See
StockholmTools for a list of tools for working with
StockholmFormat.
Also see the
Bioperl Stockholm class.