String Transducers

From Biowiki
(Redirected from StringTransducers)
Jump to: navigation, search

String transducers


String transducers---a.k.a. sequence transducers---are finite-state machines (that is to say, automata) with an input tape X and an output tape Y (in the case of a two-tape transducer), or an input tape X and multiple output tapes Y_1 \ldots Y_N (in the general, 1-input N-output case).

A two-tape transducer, with one input and one output, is similar to a Pair HMM; the major difference is that the probabilistic weights are normalized conditional on the input sequence.

That is, the transition & emission parameters are such that the Forward algorithm, given X and Y, computes the conditional probability P(Y|X). In contrast, the Forward algorithm for a Pair HMM computes the joint probability P(X,Y).

Since P(Y|X) is the probability that the transducer converts input X to output Y , thus \sum_Y P(Y|X) = 1 for any X. A typical way to guarantee that a transducer meets this criterion for all possible X is to normalize every emission (and transition) conditional on the input symbol (and input sequence length).

As with Pair HMMs, string transducers are decent models for substitutions, indels and similar "local" mutations. However, they can also be systematically combined to yield probabilistic models for multiple, phylogenetically-related sequences. This algorithm, Transducer Composition, makes them great tools for statistical alignment, for analyzing sequence evolution on trees and for phylogenomic modeling & reconstruction.

Other names for transducers

Alternate names for string transducers include:

  • Wikipedia:Finite_state_transducer;
  • Input/Output Automata, Alignment Automata, Stochastic Automata;
  • Moore Machines, Mealy Machines;
  • "Conditional HMMs", meaning conditionally-normalized Pair HMMs, Multiple HMMs, Evolutionary HMMs or Phylo-HMMs.

We use these terms more-or-less interchangeably, although there are precise and subtly different definitions in common usage for all of them (with well-defined relationships between the various definitions;

e.g. we can assert that

"the general Phylo-HMM implemented by xrate is just a special-case Evolutionary HMM without indels: a singleton transducer composed with a phylogenetic composition of branch transducers, none of which have insert or delete states"

and while this may sound cryptic, it is a well-formed and easily provable statement in transducer theory).

Spritually similar models

A number of models and inference algorithms in the molecular evolution algorithms literature are representable as transducers. Some are almost exactly the same thing, e.g.

Others are slightly more restrictive; like the TKF91 and TKF92 models, which do not allow indels to overlap

Substitution-only models are also (trivially) representable as transducers, of course; as, in the same way, are the various models that use a finite-state continuous time Markov chain over N alphabet symbols and an extra "gap" symbol, sometimes with corrections that approximate transducer models.

The transducers we use here are finite in their state space, unlike for example the Long Indel Model which introduces realistic gap length distributions to the TKF model, at the cost of increasing the alignment complexity from that of a finite-state pair HMM (or transducer) to that of a generalized pair HMM with arbitrary distributions over emission lengths.

Generalizations of transducers

The Lexicalized Transducer is useful for modeling context-sensitive substitutions, indels and even local micro-duplications and micro-satellite dynamics.

The Parse Tree Transducer can be used for modeling the evolution of "parse trees" (e.g. noncoding RNA gene structures), as well as strings.

It is also possible to generalize the idea to a transducer with multiple inputs X_1 \ldots X_M as well as outputs Y_1 \ldots Y_N, to model recombination events and other exchanges of genetic material.

Pages on biowiki

There are several pages on this wiki about transducers applied to bioinformatics:

Pages on Wikipedia

Wikipedia has some good background on finite-state transducers as used in computer science: