String transducers---a.k.a. sequence transducers---are finite-state machines (that is to say, automata) with an input tape and an output tape (in the case of a two-tape transducer), or an input tape and multiple output tapes (in the general, 1-input N-output case).

A two-tape transducer, with one input and one output, is similar to a Pair HMM; the major difference is that the probabilistic weights are normalized conditional on the input sequence.

That is, the transition & emission parameters are such that the Forward algorithm, given and , computes the conditional probability . In contrast, the Forward algorithm for a Pair HMM computes the joint probability .

Since is the probability that the transducer converts input to output , thus for any . A typical way to guarantee that a transducer meets this criterion for all possible is to normalize every emission (and transition) conditional on the input symbol (and input sequence length).

As with Pair HMMs, string transducers are decent models for substitutions, indels and similar "local" mutations. However, they can also be systematically combined to yield probabilistic models for multiple, phylogenetically-related sequences. This algorithm, TransducerComposition, makes them great tools for statistical alignment, for analyzing sequence evolution on trees and for phylogenomic modeling & reconstruction.

Alternate names for string transducers include:

- Wikipedia:Finite_state_transducer;
- Input/Output Automata, Alignment Automata, Stochastic Automata;
- Moore Machines, Mealy Machines;
- "Conditional HMMs", meaning conditionally-normalized Pair HMMs, Multiple HMMs, Evolutionary HMMs or Phylo-HMMs.

We use these terms more-or-less interchangeably, although there are precise and subtly different definitions in common usage for all of them
(with well-defined relationships between the various definitions;
e.g. we can assert that
** "the general Phylo-HMM implemented by xrate is just a special-case Evolutionary HMM without indels: a singleton transducer composed with a phylogenetic composition of branch transducers, none of which have insert or delete states"**
and while this may sound cryptic, it is a well-formed and easily provable statement in transducer theory).

- Diallo AB, Makarenkov V, Blanchette M. Exact and heuristic algorithms for the Indel Maximum Likelihood Problem. J Comput Biol. 2007 May;14(4):446-61.
- Kim J, Sinha S. Indelign: a probabilistic framework for annotation of insertions and deletions in a multiple alignment. Bioinformatics. 2007 Feb 1;23(3):289-97. Epub 2006 Nov 15.

- Thorne JL, Kishino H, Felsenstein J. Inching toward reality: an improved likelihood model of sequence evolution. J Mol Evol. 1992 Jan;34(1):3-16.

The transducers we use here are finite in their state space, unlike for example the LongIndelModel which introduces realistic gap length distributions to the TKF model, at the cost of increasing the alignment complexity from that of a finite-state pair HMM (or transducer) to that of a generalized pair HMM with arbitrary distributions over emission lengths.

The LexicalizedTransducer is useful for modeling context-sensitive substitutions, indels and even local micro-duplications and micro-satellite dynamics.

The ParseTreeTransducer can be used for modeling the evolution of "parse trees" (e.g. noncoding RNA gene structures), as well as strings.

It is also possible to generalize the idea to a transducer with multiple inputs as well as outputs , to model recombination events and other exchanges of genetic material.

There are several pages on this wiki about transducers applied to bioinformatics:

** **

- PhyloFilm: animations of transducers, PhyloGrammars and other evolutionary models

- Software packages

- PhylogeneticAlignmentReader (long-ish bibliography)

Wikipedia has some good background on finite-state transducers as used in computer science:

- Computational linguistics

- Bioinformatics
- David Searls & Kevin Murphy
- Searls DB, Murphy KP. Automata-theoretic models of mutation and alignment. Proc Int Conf Intell Syst Mol Biol. 1995;3:341-9.
- http://www.cs.ubc.ca/~murphyk/Papers/ismb95.pdf
- This is an amazingly far-sighted paper with examples including transducers that model frame-sensitive deletions (c.f. Reading Frame Conservation)

- Holmes
*et al*(see also paper archive)- An alignment-free generalization to indels of Felsenstein's phylogenetic pruning algorithm. Oscar Westesson, Gerton Lunter, Benedict Paten, Ian Holmes
*(Submitted on 22 Mar 2011)* - Holmes I. Phylocomposer and phylodirector: analysis and visualization of transducer indel models. Bioinformatics. 2007 Dec 1;23(23):3263-4. Epub 2007 Sep 5.
- Bradley RK, Holmes I. Transducers: an emerging probabilistic framework for modeling indels on trees. Bioinformatics. 2007 Dec 1;23(23):3258-62. Epub 2007 Sep 5.
- Holmes I. Using guide trees to construct multiple-sequence evolutionary HMMs. Bioinformatics. 2003;19 Suppl 1:i147-57. (PDF) (errata)
- Miklos I, Lunter GA, Holmes I. A "Long Indel" model for evolutionary sequence alignment. Mol Biol Evol. 2004 Mar;21(3):529-40. Epub 2003 Dec 23. (PDF)
- Holmes I, Bruno WJ. Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics. 2001 Sep;17(9):803-20. (PDF)

- An alignment-free generalization to indels of Felsenstein's phylogenetic pruning algorithm. Oscar Westesson, Gerton Lunter, Benedict Paten, Ian Holmes
- Statistical alignment literature

- David Searls & Kevin Murphy

- People interested in and/or developing this stuff (please add yourself)...
- OscarWestesson
- RobertBradley
- GertonLunter
- IstvanMiklos
- AaronMackey
- JotunHein
- AlexeiDrummond
- AdamNovak
- BenedictPaten
- DanKlein
- KrishnaRoskin
- RahulSatija
- LiorPachter
- DavidHaussler
- RobinDowell
- MarcSuchard
- David Ardell
- Abdoulaye Baniré Diallo

Topic revision: r79 - 2011-05-07 - IanHolmes

Copyright © 2008-2014 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding TWiki? Send feedback

Ideas, requests, problems regarding TWiki? Send feedback

TWiki Appliance - Powered by TurnKey Linux