Phylogenetic Alignment

From Biowiki
Jump to: navigation, search

Phylogenetic alignment


Phylogenetic Alignment, aka phylo-alignment or Statistical Alignment (a term coined by Jotun Hein), is an approach to sequence alignment with several characteristics:

  • Evolutionary models are defined rigorously as continuous-time Markov chains on sequence space
  • Stochastic grammars and dynamic programming algorithms are then derived systematically
  • Phylogenetic trees, sequence alignments and/or model parameters are imputed or, in a Bayesian framework, co-sampled by MCMC

Phylo-alignment can seem mathematically daunting, because deriving a stochastic grammar systematically from instantaneous mutation rates is not as easy as (say) observing empirically that "setting the gap opening penalty to -11 and the gap extension penalty to -1 gives pretty good results". However, if the math can be worked out, we get an integrated framework for doing alignments, building phylogenetic trees, predicting exons and other features, measuring rates of mutation events (indels, substitutions...) and reconstructing ancient sequence, using technology that's not too much more complicated than the Baum-Welch algorithm for Pair HMMs.

Software tools

We (Holmes lab) have several pieces of software in active development for doing statistical alignment, some more experimental than others.

Other software tools for phylo-alignment include


Links to key papers can be found on the Phylogenetic Alignment Reader page. You really want to read the paper that started it all (often nicknamed "TKF91"):

(OK, there were several seminal papers before this, e.g. by Bishop and Thompson, Felsenstein, Lawrence Liu et al... but the TKF91 paper is probably the most influential. See phylogenetic alignment reader for more!)