TreeTransducers, finite-state machines which can mutate parse trees, provide a framework for modeling the evolution of structured RNAs along a phylogenetic tree.
Our software Indiegram implements a tree-transducer-based model of the evolution of three structured RNAs. The TKF Structure Tree model is provided with the software, but users can create their own models using the provided Perl code-generation scripts. Given a model, the program Indiegram, distributed as part of the DART software package, perform Maximum-Likelihood inference.
The Perl scripts found in the src/indiegram/perl directory implement our model-construction algorithm for evolutionary models of structured RNAs: Given a singlet transducer and a branch transducer, they generate the corresponding evolutionary model representing three sequences related to an unknown common ancestor by a star phylogeny.
For examples of Indiegram run on real RNA, see here: Indiegram Examples
The Perl script fourWay.pl is a driver script for the packages implementing the transducer-composition algorithms on a three-branch tree. It can be run from the dart/src/indiegram directory as
./perl/fourWay.pl --graph tt/simpleTest.singlet.tt tt/simpleTest.branch.tt
where the --graph option tells it to generate the transition/bifurcation graph of the composite model. A full list of options is available via the --help option.
The program Indiegram uses code generated using the related driver script tripletSCFG.pl
./perl/tripletSCFG.pl --write tt/tkfst.singlet.tt tt/tkfst.branch.tt
where the --write command tells it to generate C++ code for constructing the model using the dart API. (See files DartSrc:indiegram/tkfst.h & DartSrc:indiegram/tkfst.cc for example usage in indiegram.)
The following section describes the .tt file format used to specify singlet and branch transducers.
Transducer file formats
Here is a detailed description of file formats understood by the Perl code-generation packages run by tripletSCFG.pl.
Several example files are provided:
- simpleTest.singlet.tt and simpleTest.branch.tt: Left-emitting HMM-like structure; bifurcations initiated from the root.
- bifurcTest.singlet.tt and bifurcTest.branch.tt: Left-emitting HMM-like structure; bifurcations initiated from either the root or leaves.
- tkfst.singlet.tt and tkfst.branch.tt: The full TKF Structure Tree model.
We give excerpts from the simpleTest model below.
States are typed as follows:
>singlet-stateTyping s = s il = i Bi[s s] = bi e = e
>branch-stateTyping s = s il = i ml = m dl = m w = w Bm[s s] = bm e = e
Match states (only allowed for branch transducers) can absorb as well as emit symbols.
State absorptions can be:
|Left and right terminals||lr|
Bifurcations correspond to emission of nonterminals in our model-construction algorithm. In the above example the state Bm[s s] absorbs a nonterminal named Bi[s s] (which is emitted by the singlet transducer).
>branch-absorbProfiling ml = l dl = l Bm[s s] = Bi[s s]
States can have emission profiles:
|Left and right terminals||lr|
Bifurcations correspond to emission of nonterminals in our model-construction algorithm. In the above example the state Bi[s s] emits a nonterminal named Bi[s s].
>singlet-emitProfiling il = l Bi[s s] = Bi[s s]
>branch-emitProfiling il = l ml = l dl = Bm[s s] = Bm[s s]
Note that the "Delete" state dl absorbs symbols but has an empty emission.
The probability distributions of emissions can be specified as well.
>singlet-emitDist il = p
State il emits with probability p. It is assumed that the emission probability is a function p of the emitted character(s).
>branch-emitDist il = p ml = m_t
The distribution m_t is interpreted to mean that the function m depends on a (time) parameter t, ie m(t).
The transition matrix and bifurcations are specified separately.
>singlet-tm s -> il = 1-ep(), e = ep() il -> il = ilp(), Bi[s s] = blp(), e = 1-ilp()-blp() e ->
>branch-tm s -> il = ilp(t), w = 1-ilp(t) il -> il = ilp(t), w = 1-ilp(t) ml -> il = ilp(t), w = 1-ilp(t) dl -> il = ilp(t), w = 1-ilp(t) w -> ml = mlp(t), dl = 1-mlp(t), Bm[s s] = 1, e = 1 e ->
The children of bifurcations are specified as (left, center, right):
>singlet-bifurc Bi[s s] -> (s, s, e)
>branch-bifurc Bm[s s] -> (s, s, e)
The user can specify the order in which to sort states. This is very useful for looking at the output of the composition algorithm.
>singlet-stateSorting s = 0 il = 1 Bi[s s] = 5 e = 10
>branch-stateSorting s = 0 il = 1 ml = 2 dl = 2.5 w = 3 Bm[s s] = 5 e = 10
-- Robert Bradley - 07 Aug 2008