Difference between revisions of "Felsenstein Wildcards"
Ian Holmes (talk  contribs) (Imported from TWiki) 
m (Move page script moved page FelsensteinWildcards to Felsenstein Wildcards: Rename from TWiki to MediaWiki style) 
(No difference)

Latest revision as of 22:42, 1 January 2017
Felsenstein wildcards
When using phylogenetic models to reconstruct ancient sequence, it is often useful to separate the task into two steps:
 Imputation of of the alignment and indel structure, i.e. which residues of the ancestral sequence are aligned to which presentday residues;
 Imputation of the residues themselves, i.e. the actual sequence (conditioned on the alignment imputed in step 1).
This is always possible if the underlying indel model is independent of the substitution model (as e.g. in the TKF model or the Long Indel model).
During step 1, which is often the most computationally challenging step, one is effectively considering all possible ancestral residues, and we can therefore think of the ancestral genotypes as sequences of "wildcards" at this stage. When calculating the likelihood of a particular alignment, one sums over the actual values of such residues, using Felsenstein's pruning algorithm. At step 2, posterior probability distributions over the actual residues themselves can be found using ElstonStewart peeling (aka the sumproduct algorithm).
Conventionally, such summedout residues are often represented as asterisks. The term "Felsenstein wildcard" was introduced in the following paper:
 Holmes & Bruno: Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics 2001;17:80320. (pdf)
A Google search for the phrase turns up several programs for & papers on paleogenomics and statistical alignment.
Note: this approach can only be used if the indel model is independent of the actual sequence, so that inference of the sequence itself can be postponed. For example, a lexicalized transducer that modeled microsatellite expansion and contraction would not allow for the use of Felsenstein wildcards during alignment, since the indel rates in such a model depend on neighboring sequence.
 Ian Holmes  01 Aug 2007