Tags:
create new tag
, view all tags

String transducers

Introduction

String transducers---a.k.a. sequence transducers---are finite-state machines (that is to say, automata) with an input tape and an output tape (in the case of a two-tape transducer), or an input tape and multiple output tapes (in the general, 1-input N-output case).

A two-tape transducer, with one input and one output, is similar to a Pair HMM; the major difference is that the probabilistic weights are normalized conditional on the input sequence.

That is, the transition & emission parameters are such that the Forward algorithm, given and , computes the conditional probability . In contrast, the Forward algorithm for a Pair HMM computes the joint probability .

Since is the probability that the transducer converts input to output , thus for any . A typical way to guarantee that a transducer meets this criterion for all possible is to normalize every emission (and transition) conditional on the input symbol (and input sequence length).

As with Pair HMMs, string transducers are decent models for substitutions, indels and similar "local" mutations. However, they can also be systematically combined to yield probabilistic models for multiple, phylogenetically-related sequences. This algorithm, TransducerComposition, makes them great tools for statistical alignment, for analyzing sequence evolution on trees and for phylogenomic modeling & reconstruction.

Other names for transducers

Alternate names for string transducers include:

  • Wikipedia:Finite_state_transducer;
  • Input/Output Automata, Alignment Automata, Stochastic Automata;
  • Moore Machines, Mealy Machines;
  • "Conditional HMMs", meaning conditionally-normalized Pair HMMs, Multiple HMMs, Evolutionary HMMs or Phylo-HMMs.

We use these terms more-or-less interchangeably, although there are precise and subtly different definitions in common usage for all of them (with well-defined relationships between the various definitions; e.g. we can assert that "the general Phylo-HMM implemented by xrate is just a special-case Evolutionary HMM without indels: a singleton transducer composed with a phylogenetic composition of branch transducers, none of which have insert or delete states" and while this may sound cryptic, it is a well-formed and easily provable statement in transducer theory).

Spritually similar models

A number of models and inference algorithms in the molecular evolution algorithms literature are representable as transducers. Some are almost exactly the same thing, e.g. Others are slightly more restrictive; like the TKF91 and TKF92 models, which do not allow indels to overlap Substitution-only models are also (trivially) representable as transducers, of course; as, in the same way, are the various models that use a finite-state continuous time Markov chain over N alphabet symbols and an extra "gap" symbol, sometimes with corrections that approximate transducer models.

The transducers we use here are finite in their state space, unlike for example the LongIndelModel which introduces realistic gap length distributions to the TKF model, at the cost of increasing the alignment complexity from that of a finite-state pair HMM (or transducer) to that of a generalized pair HMM with arbitrary distributions over emission lengths.

Generalizations of transducers

The LexicalizedTransducer is useful for modeling context-sensitive substitutions, indels and even local micro-duplications and micro-satellite dynamics.

The ParseTreeTransducer can be used for modeling the evolution of "parse trees" (e.g. noncoding RNA gene structures), as well as strings.

It is also possible to generalize the idea to a transducer with multiple inputs as well as outputs , to model recombination events and other exchanges of genetic material.

Pages on biowiki

There are several pages on this wiki about transducers applied to bioinformatics:

Pages on Wikipedia

Wikipedia has some good background on finite-state transducers as used in computer science:

References

Topic revision: r79 - 2011-05-07 - IanHolmes
 

This site is powered by the TWiki collaboration platformCopyright © 2008-2014 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
TWiki Appliance - Powered by TurnKey Linux