Tree Transducers, finite-state machines which can mutate parse trees,
provide a framework for modeling the evolution of structured RNAs along a phylogenetic tree.
Our software Indiegram implements a tree-transducer-based model of the evolution
of three structured RNAs. The TKF Structure Tree model is provided with the software,
but users can create their own models using the provided Perl code-generation
scripts. Given a model, the program Indiegram, distributed as part of the DART
software package, perform Maximum-Likelihood inference.
The Perl scripts found in the src/indiegram/perl directory implement our
model-construction algorithm for evolutionary models of structured RNAs:
Given a singlet transducer and a branch transducer, they generate the
corresponding evolutionary model representing three sequences related
to an unknown common ancestor
by a star phylogeny.
For examples of Indiegram run on real RNA, see here: Indiegram Examples
Model construction
The Perl script fourWay.pl is a driver script for the packages implementing
the transducer-composition algorithms on a three-branch tree.
It can be run from the dart/src/indiegram directory as
./perl/fourWay.pl --graph tt/simpleTest.singlet.tt tt/simpleTest.branch.tt
where the --graph option tells it to generate the transition/bifurcation graph of the composite model.
A full list of options is available via the --help option.
The program Indiegram uses code generated using the related driver script tripletSCFG.pl
./perl/tripletSCFG.pl --write tt/tkfst.singlet.tt tt/tkfst.branch.tt
where the --write command tells it to generate C++ code for constructing the model using the dart API.
(See files DartSrc:indiegram/tkfst.h & DartSrc:indiegram/tkfst.cc for example usage in indiegram.)
The following section describes the .tt file format used to specify singlet and branch transducers.
Transducer file formats
Here is a detailed description of file formats understood by the Perl code-generation packages
run by tripletSCFG.pl.
Several example files are provided:
- simpleTest.singlet.tt and simpleTest.branch.tt: Left-emitting HMM-like structure; bifurcations initiated from the root.
- bifurcTest.singlet.tt and bifurcTest.branch.tt: Left-emitting HMM-like structure; bifurcations initiated from either the root or leaves.
- tkfst.singlet.tt and tkfst.branch.tt: The full TKF Structure Tree model.
We give excerpts from the simpleTest model below.
State typing
States are typed as follows:
| State type | symbol |
| Start | s |
| Insert | i |
| Insert (Bifurcation) | bi |
| Match | m |
| Match (Bifurcation) | bm |
| End | e |
simpleTest.singlet.tt:
>singlet-stateTyping
s = s
il = i
Bi[s s] = bi
e = e
simpleTest.branch.tt:
>branch-stateTyping
s = s
il = i
ml = m
dl = m
w = w
Bm[s s] = bm
e = e
State absorptions
Match states (only allowed for branch transducers)
can absorb as well as emit symbols.
State absorptions can be:
Bifurcations correspond to emission of nonterminals in our model-construction algorithm.
In the above example the state Bm[s s] absorbs a nonterminal named Bi[s s]
(which is emitted by the singlet transducer).
simpleTest.branch.tt:
>branch-absorbProfiling
ml = l
dl = l
Bm[s s] = Bi[s s]
State emissions
States can have emission profiles:
| Emission profile | symbol |
| Left terminal | l |
| Right terminal | r |
| Left and right terminals | lr |
| Nonterminal | nonterminal name |
Bifurcations correspond to emission of nonterminals in our model-construction algorithm.
In the above example the state Bi[s s] emits a nonterminal named Bi[s s].
simpleTest.singlet.tt:
>singlet-emitProfiling
il = l
Bi[s s] = Bi[s s]
simpleTest.branch.tt:
>branch-emitProfiling
il = l
ml = l
dl =
Bm[s s] = Bm[s s]
Note that the "Delete" state dl absorbs symbols but has an empty emission.
The probability distributions of emissions can be specified as well.
simpleTest.singlet.tt:
>singlet-emitDist
il = p
State il emits with probability p. It is assumed that the
emission probability is a function p of the emitted character(s).
simpleTest.branch.tt:
>branch-emitDist
il = p
ml = m_t
The distribution m_t is interpreted to mean that the function m
depends on a (time) parameter t, ie m(t).
State transitions
The transition matrix and bifurcations are specified
separately.
simpleTest.singlet.tt:
>singlet-tm
s -> il = 1-ep(), e = ep()
il -> il = ilp(), Bi[s s] = blp(), e = 1-ilp()-blp()
e ->
simpleTest.branch.tt:
>branch-tm
s -> il = ilp(t), w = 1-ilp(t)
il -> il = ilp(t), w = 1-ilp(t)
ml -> il = ilp(t), w = 1-ilp(t)
dl -> il = ilp(t), w = 1-ilp(t)
w -> ml = mlp(t), dl = 1-mlp(t), Bm[s s] = 1, e = 1
e ->
The children of bifurcations are specified as (left, center, right):
simpleTest.singlet.tt:
>singlet-bifurc
Bi[s s] -> (s, s, e)
simpleTest.branch.tt:
>branch-bifurc
Bm[s s] -> (s, s, e)
Pretty display
The user can specify the order in which to sort states.
This is very useful for looking at the output of the composition algorithm.
simpleTest.singlet.tt:
>singlet-stateSorting
s = 0
il = 1
Bi[s s] = 5
e = 10
simpleTest.branch.tt:
>branch-stateSorting
s = 0
il = 1
ml = 2
dl = 2.5
w = 3
Bm[s s] = 5
e = 10
-- RobertBradley - 07 Aug 2008 |