Research  Teaching 



String transducers
IntroductionString transducersa.k.a. sequence transducersare finitestate machines (that is to say, automata) with an input tape and an output tape (in the case of a twotape transducer), or an input tape and multiple output tapes (in the general, 1input Noutput case). A twotape transducer, with one input and one output, is similar to a Pair HMM; the major difference is that the probabilistic weights are normalized conditional on the input sequence. That is, the transition & emission parameters are such that the Forward algorithm, given and , computes the conditional probability . In contrast, the Forward algorithm for a Pair HMM computes the joint probability . Since is the probability that the transducer converts input to output , thus for any . A typical way to guarantee that a transducer meets this criterion for all possible is to normalize every emission (and transition) conditional on the input symbol (and input sequence length). As with Pair HMMs, string transducers are decent models for substitutions, indels and similar "local" mutations. However, they can also be systematically combined to yield probabilistic models for multiple, phylogeneticallyrelated sequences. This algorithm, TransducerComposition, makes them great tools for statistical alignment, for analyzing sequence evolution on trees and for phylogenomic modeling & reconstruction.
Other names for transducersAlternate names for string transducers include:
We use these terms moreorless interchangeably, although there are precise and subtly different definitions in common usage for all of them (with welldefined relationships between the various definitions; e.g. we can assert that "the general PhyloHMM implemented by xrate is just a specialcase Evolutionary HMM without indels: a singleton transducer composed with a phylogenetic composition of branch transducers, none of which have insert or delete states" and while this may sound cryptic, it is a wellformed and easily provable statement in transducer theory).
Spritually similar modelsA number of models and inference algorithms in the molecular evolution algorithms literature are representable as transducers. Some are almost exactly the same thing, e.g.
The transducers we use here are finite in their state space, unlike for example the LongIndelModel which introduces realistic gap length distributions to the TKF model, at the cost of increasing the alignment complexity from that of a finitestate pair HMM (or transducer) to that of a generalized pair HMM with arbitrary distributions over emission lengths.
Generalizations of transducersThe LexicalizedTransducer is useful for modeling contextsensitive substitutions, indels and even local microduplications and microsatellite dynamics. The ParseTreeTransducer can be used for modeling the evolution of "parse trees" (e.g. noncoding RNA gene structures), as well as strings. It is also possible to generalize the idea to a transducer with multiple inputs as well as outputs , to model recombination events and other exchanges of genetic material.
Pages on biowikiThere are several pages on this wiki about transducers applied to bioinformatics:
Pages on WikipediaWikipedia has some good background on finitestate transducers as used in computer science:
References


Main.StringTransducers r79  20110507  04:50:00  IanHolmes  Biowiki content is in the public domain. Comments on this site? StringTransducers">Send feedback 