ThorneKishinoFelsenstein 1991 "links model" (also called "TKF" or "TKF91"), as described in their article in the Journal of Molecular Evolution. The substitution model is Kimura's two-parameter model. I have used RASMOL's Carbon, Nitrogen, Oxygen & Fluorine atoms to represent A,C,G,T residues, because I have no shame. I realise VRML would be better for this, but my coding (like DNA evolution) is a random walk on a stochastic landscape, so sue me. (see Kenny Duong's VrmlMod for a prototype VRML version) The point of all this is PhyloAlignment (aka StatisticalAlignment), the systematic derivation of sequence analysis algorithms from molecular evolutionary hypotheses. These hypotheses are formulated as a continuous-time Markov chain over sequence space (and possibly over structure space, e.g. Hein & Pedersen's models of gene structure evolution, various recent models of RNA secondary structure evolution). Phylo-alignment, and in particular the theory of StringTransducers, provides a framework that allows us to think coherently about indels on trees in the same way that early work of Jukes-Cantor, Kimura, Felsenstein et al provided a systematic framework for thinking about substitutions on trees. Joe Felsenstein 's 2004 book, "Inferring Phylogenies", has a review of statistical alignment. There's a comprehensive bibliography here on this website. One reason statistical alignment is cool is that you can measure, analyse and visualise the underlying evolutionary process. That's what these movies are about.
- PhyloFilm: The Phylogenetic Film Show
- TKF movies
- Animations of phylo-grammars
- Animation codes
The TKF model with a "splitting" event, generating a tree. Illustrates how multiple alignment & phylogeny are aspects of the same graphical model inference problem, the dynamic programming solution to which was presented by Hein (PSB, 2001). TKF tree (Quicktime), TKF tree (AVI)Here's a pretty grainy YouTube of the TKF tree: StringTransducers (a sort of Pair HMM) on branches of a tree (Holmes, ISMB, 2003; see also Hein, 2001; Paten et al; Redelings & Suchard; Lunter; et al). Here's a cartoon of an EHMM in action: Evolutionary HMM (Quicktime), Evolutionary HMM (AVI), Evolutionary HMM (MPG) and here is a legend for the EHMM movies (PDF) Here's a YouTube. Harder to follow what's going on here, as the youtube upload seems to have messed it up a bit. Maybe I should try Google Video or Ifilm or something. The big complicated-looking machines in these cartoons are phylogenetic arrays of string transducers. Essentially, a StringTransducer is a finite state machine with an input tape and an output tape. The output tape from one transducer can be fed into the input of the next, offering systematic ways of chaining transducers together. This gives you a systematic way of designing scoring schemes for multi-sequence HMMs, or (equivalently) for keeping track of (possibly overlapping) indel events on trees. Here's that legend of the state types for phylogenetically composed transducer/EHMM as an inline PNG image: Ising sequence (Quicktime), Ising sequence (AVI) A more realistic long indel version of the TKF model, allowing multiple-residue indels (and hence affine gap penalties in the dynamic programming recursions), have been independently developed by Knudsen and Miyamoto (JMB 2003) and by Miklos Lunter & Holmes (MBE 2004). See PhylogeneticAlignmentReader for a long list of refs. Some other pragmatic/approximate models, like Mitchison & Durbin's Tree HMMs, and Thorne et al's fragment models, can model affine gaps to some extent. Simulation movies of the long indel model will eventually be forthcoming on this page... more). These sampling steps, which are no more complex than Pair HMM alignment, are sufficient to eventually explore all alignment space (for a given tree), and so may be helpful in using more sophisticated models for statistical alignment. (Indeed, Handel can be used to sample multiple alignments from any probability distribution, using importance sampling.) DART package (checkout anonymous CVS). A reference for xrate is Holmes & Rubin (JMB 2002). Of course, generic MCMC and gradient-ascent methods can be used to parameterise statistical alignment, since it's based on likelihood models. Wouldn't it be fun to have species- and even gene family-specific evolutionary simulation movies parameterised, and modelled, as accurately as possible using available published genome sequence.... hmmmm phylo grammars as modeled by xrate software. A phylo-grammar is a model incorporating correlations between sequences (via substitution models on phylogenetic trees) and within sequences (via grammatically structured features representing protein-coding genes, ncRNAs, etc).
phylodirector. The phylo-grammar animations were made with another perl script treealign.pl and the BerkeleyMpegEncoder.