# Hand Align Transducer

# HandAlign Transducer

The String Transducer used by the Hand Align program for Statistical Alignment.

See Transducer Legend for explanations of symbols in this diagram.
The transition weights (*a*, *b*, *r*, etc.) are defined below.

Here , , is the deletion rate, is the insertion rate, is the indel extension probability, and is the evolutionary time (branch length) separating input and output sequences.

## Mean gap length

Note that the mean length of an indel is . In practice, handalign requires the user to specify this length and the parameter is then recovered as .

## Emissions

The transducer is a Moore machine, so emissions may be thought of as occurring within states (M, D, I). The absorption/emission probabilities (not shown in the diagram) are related to an underlying substitution model. Specifically, let denote an instantaneous point substitution rate matrix with equilibrium probability vector . Denoting the input symbol by x and the output symbol by y, the I-state emits symbols (y) with probability , the M-state absorbs/emits symbols (x,y) with probability (conditioned on x) of the matrix exponential , and the D-state absorbs any symbol (x) with probability 1.

## Root model

The transducer shown above models the evolution along a branch, i.e. the probability of a child node given its parent. What about the original sequence - the uber-parent at the root?

The ur-ancestral (root) sequence is modeled as an IID sequence with geometrically distributed length, each character distributed according to . (This may be seen as a simple state machine - e.g. see Singlet Transducer.) The parameter of the length distribution is , so the mean length is . In practice, handalign requires the user to specify instead of . The insertion rate is then recovered as .

-- Ian Holmes - 14 Dec 2011