|
| Why Metropolis/MCMC sampling is like networking for probabilistic modelers
The following description assumes familiarity with Metropolis-Hastings sampling (propose-accept/reject)
and starts with a concrete example: the Handel
program, developed by Ian Holmes and Bill Bruno.
Handel's original purpose was to use
the Thorne-Kishino-Felsenstein model to sample multiple alignments from an
Evolutionary HMM. However, using importance sampling, Handel can now
sample alignments from any likelihood function implemented by a
third-party program.
Examples of appropriate programs would be the Pedersen-Hein evolutionary
genefinder (EvoGene) or the Knudsen-Hein evolutionary RNA structure
predictor (PFOLD). The user passes the name of this program to Handel (as
a command-line argument). Starting from a seed alignment, Handel then
proposes MCMC alignment-sampling "moves", passing the new candidate
alignments down a Unix pipe to the third-party program (EvoGene, PFOLD/xfold or
whatever) which spits out log-likelihood scores (or lod-scores) in bits.
Handel reads back these log-likelihoods & uses them to evaluate a Hastings
ratio, which is then used to accept or reject the move stochastically.
The net effect is that alignments are sampled according to the likelihood
function implemented by the third-party program, but with the MCMC mixing
properties of Handel (or whatever alignment sampler is used). So this can
be viewed as an alignment method, or as a way of making annotation tools
more robust to bad alignments. Or, indeed, as a "more Bayesian" way of
analysing sequence alignments. Natural extensions involve sampling trees
and/or various annotation features.
This sort of sampling approach is timely: several other groups
have thought about, talked about or worked in this area (e.g. Sean Eddy, Lior Pachter, Bjarne Knudsen, Mike Eisen, Rasmus Nielsen).
There's a steadily
increasing number of MCMC alignment samplers that could be swapped in for
Handel in the above description (e.g. one recently developed by students
of Steven Brenner and Mike Eisen, the Mr Bayes tree sampler by John Huelsenbeck et al
or the tree-and-alignment-sampler by Marc Suchard et al).
And, clearly, there are many candidate
likelihood functions it'd be interesting to explore, such as the
evolutionary models that Haussler and Siepel, or Lunter and Hein, or Jakob Skou Pedersen have
worked with.
The idea of connecting programs together in this way is quite
appealing, I think: probabilistic approaches playing well together.
More to follow here.
-- Ian Holmes, 14 March 2005 |