Phylogenetic Alignment
Phylogenetic alignment
Introduction
Phylogenetic Alignment, aka phylo-alignment or Statistical Alignment (a term coined by Jotun Hein), is an approach to sequence alignment with several characteristics:
- Evolutionary models are defined rigorously as continuous-time Markov chains on sequence space
- Stochastic grammars and dynamic programming algorithms are then derived systematically
- Phylogenetic trees, sequence alignments and/or model parameters are imputed or, in a Bayesian framework, co-sampled by MCMC
Phylo-alignment can seem mathematically daunting, because deriving a stochastic grammar systematically from instantaneous mutation rates is not as easy as (say) observing empirically that "setting the gap opening penalty to -11 and the gap extension penalty to -1 gives pretty good results". However, if the math can be worked out, we get an integrated framework for doing alignments, building phylogenetic trees, predicting exons and other features, measuring rates of mutation events (indels, substitutions...) and reconstructing ancient sequence, using technology that's not too much more complicated than the Baum-Welch algorithm for Pair HMMs.
Software tools
We (Holmes lab) have several pieces of software in active development for doing statistical alignment, some more experimental than others.
- Phylo Composer and Phylo Director -- tools for working with String Transducers
- Handel Package -- general-purpose multiple alignment tool
- MCMC alignment sampling using string transducers
- EM codes for measuring rates: the xrate program
- distributed as part of DART
- see Downloading Dart
- Evol Doer
- (experimental) Statistical Alignment of RNA
Other software tools for phylo-alignment include
- Phylogeny Cafe from Istvan Miklos, Gerton Lunter et al
- BEAST by Alexei Drummond and Andrew Rambaut
- BAli-Phy by Benjamin Redelings and Marc Suchard
- StatAlign by Adam Novak, Istvan Miklos, Rune Lyngsoe and Jotun Hein
- MCMCSALUT and MCMCALGN by Dirk Metzler
- PRANK by Ari Loytynoja and Nick Goldman
- STATALIGN by Jeff Thorne
References
Links to key papers can be found on the Phylogenetic Alignment Reader page. You really want to read the paper that started it all (often nicknamed "TKF91"):
- Thorne et al.: An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 1991;33:114-24. (PDF)
(OK, there were several seminal papers before this, e.g. by Bishop and Thompson, Felsenstein, Lawrence Liu et al... but the TKF91 paper is probably the most influential. See phylogenetic alignment reader for more!)