Phylogenetic Alignment Reader
From Biowiki
Selected highlights of the Phylogenetic Alignment literature
Features of phylogenetic alignment
- Phylogenetic Alignment, aka Statistical Alignment
- aka String Transducers, Evolutionary HMMs, Phylo-HMMs, Tree HMMs, Evolutionary SCFGs
- Explicit probabilistic representations for evolutionary models
- Stochastic grammars, sampled sequence trajectories on phylogenetic trees
- Combined Bayesian inference of phylogeny and alignment
Roots
- Likelihood phylogeny
- Felsenstein &: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 1981;17:368-76.
- Maximum likelihood alignment
- Bishop & Thompson: Maximum likelihood alignment of DNA sequences. J. Mol. Biol. 1986;190:159-65.
- Minimum message length alignment & finite state machines
- Allison & Yee: Minimum message length encoding and the comparison of macromolecules. Bull. Math. Biol. 1990;52:431-53.
- Allison et al.: Finite-state models in the alignment of macromolecules. J. Mol. Evol. 1992;35:77-89.
- MCMC multiple alignment by Gibbs sampling
- Lawrence et al.: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 1993;262:208-14.
- Paired-sequence hidden Markov models and stochastic grammars
- The book Biological Sequence Analysis by Durbin, Eddy, Krogh & Mitchison (1998)
Deriving phyloalignment algorithms from indel models
- Thorne et al.: An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 1991;33:114-24. (PDF)
- Arguably the central paper of the statistical phylo-alignment literature. Develops the most basic "TKF91 model" for pairwise alignment
- Hein &: An algorithm for statistical alignment of sequences related by a binary tree. Pac Symp Biocomput 2001;:179-90.
- applies TKF91 to a tree
- also M. Steel and J. Hein, Applying the Thorne–Kishino–Felsenstein model to sequence evolution on a star-shaped tree, Appl. Math. Lett. 14 (2001), p. 679.
- Holmes & Bruno: Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics 2001;17:803-20. (PDF)
- develops MCMC for TKF91 using HMM theory
- Holmes &: Using guide trees to construct multiple-sequence evolutionary HMMs. Bioinformatics 2003;19 Suppl 1:i147-57. (PDF) (errata)
- constructs multi-sequence HMMs systematically by transducer composition
- Holmes &: Phylocomposer and phylodirector: analysis and visualization of transducer indel models. Bioinformatics 2007;23:3263-4.
(pdf)
- first implementation of a general-purpose phylogenetic transducer algorithm (more info here: Phylo Composer)
- Knudsen & Miyamoto: Sequence alignments and pair hidden Markov models using evolutionary history. J. Mol. Biol. 2003;333:453-60.
- Miklós et al.: A "Long Indel" model for evolutionary sequence alignment. Mol. Biol. Evol. 2004;21:529-40. (PDF)
- Rivas &: Evolutionary models for insertions and deletions in a probabilistic modeling framework. BMC Bioinformatics 2005;6:63.
- time-dependent affine gap penalties for transducers
- Lunter et al.: Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinformatics 2005;6:83.
- Redelings & Suchard: Joint Bayesian estimation of alignment and phylogeny. Syst. Biol. 2005;54:401-18.
- Metzler et al.: Assessing variability by joint sampling of alignments and mutation rates. J. Mol. Evol. 2001;53:660-9.
- MCMC samplers
Beyond point substitution models
- Robinson et al.: Protein evolution with dependence among codons due to tertiary structure. Mol. Biol. Evol. 2003;20:1692-704.
- Lunter & Hein: A nucleotide substitution model with nearest-neighbour interactions. Bioinformatics 2004;20 Suppl 1:i216-23.
- Siepel & Haussler: Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol. Biol. Evol. 2004;21:468-88.
RNA models
- Knudsen & Hein: Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res. 2003;31:3423-8.
- Holmes &: A probabilistic model for the evolution of RNA structure. BMC Bioinformatics 2004;5:166. (PDF)
- Nested TKF91 models for stems and loops; tree transducers
- Matsui et al.: Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures. Bioinformatics 2005;21:2611-7.
Protein models
- Goldman et al.: Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses. J. Mol. Biol. 1996;263:196-208.
Other papers following from TKF91
- Thorne et al.: Inching toward reality: an improved likelihood model of sequence evolution. J. Mol. Evol. 1992;34:3-16.
- the "TKF92 model"
- Thorne & Churchill: Estimation and reliability of molecular sequence alignments. Biometrics 1995;51:100-13.
- Miklós &: An improved algorithm for statistical alignment of sequences related by a star tree. Bull. Math. Biol. 2002;64:771-9.
- Hein et al.: Statistical alignment: computational properties, homology testing and goodness-of-fit. J. Mol. Biol. 2000;302:265-79.
- Metzler &: Statistical alignment based on fragment insertion and deletion models. Bioinformatics 2003;19:490-9.
- Holmes &: Using evolutionary Expectation Maximization to estimate indel rates. Bioinformatics 2005;21:2294-300. (PDF)
- Lunter et al.: An efficient algorithm for statistical multiple alignment on arbitrary phylogenetic trees. J. Comput. Biol. 2003;10:869-89.
- Jojic et al.: Efficient approximations for learning phylogenetic HMM models from data. Bioinformatics 2004;20 Suppl 1:i161-8.
- Siepel & Haussler: Combining phylogenetic and hidden Markov models in biosequence analysis. J. Comput. Biol. 2004;11:413-28.
- Metzler &: Statistical alignment based on fragment insertion and deletion models. Bioinformatics 2003;19:490-9.
Reading the indel-rate signal
- Lunter et al.: Genome-wide identification of human functional DNA using a neutral indel model. PLoS Comput. Biol. 2006;2:e5.
Reconstruction of indel history
In light of the interest in ancestral genome reconstruction, several authors have approached the problem of indel modeling from this slightly different direction. Typically these approaches require the input alignment to be specified and this leads the authors to a slightly different formalism, although the methods can still be reformulated within the framework of string transducers.
- Blanchette et al
- Diallo et al.: Exact and heuristic algorithms for the Indel Maximum Likelihood Problem. J. Comput. Biol. 2007;14:446-61.
- Chindelevitch et al.: On the inference of parsimonious indel evolutionary scenarios. J Bioinform Comput Biol 2006;4:721-44.
- Sinha et al
- Kim & Sinha: Indelign: a probabilistic framework for annotation of insertions and deletions in a multiple alignment. Bioinformatics 2007;23:289-97.
- Ortheus: Paten, Birney et al
- Paten et al.: Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Res. 2008;18:1829-43.
- Michael Jordan and Alexandre Bouchard-Cфtй
- Alexandre Bouchard-Cфtй, Michael I. Jordan and Dan Klein. (2009) Efficient Inference in Phylogenetic In Del Trees. Advances in Neural Information Processing Systems 21 (NIPS). Vancouver, Canada. (paper)
- Poster & slides here: http://www.stat.ubc.ca/~bouchard/