Long Indel Model

From Biowiki
Jump to: navigation, search

The "Long Indel Model" introduces realistic indel length distributions to the TKF model.

It does so at the expense of expanding the finite state transducer into a conditional generalized Pair HMM with arbitrary length distributions for indels, which effectively has a number of states bounded by the product of the sequence lengths.

Finite-state transducer approximations to this model exist (Gotoh Transducer, Knudsen Miyamoto Transducer). A related approximation is the [TKF model TKF92 model] which breaks a sequence into indivisible fragments. (The transducer approximations assume that the branch length is short enough to neglect overlapping indels, whereas the TKF92 model assumes that overlapping indels never occur.)

In fact, it is known that any "concave" gap penalty (i.e. monotonically decreasing gap length distribution) can be well-approximated by a finite state machine, at least for Viterbi alignment:

-- Ian Holmes - 23 Apr 2008

  • cited by at least one person as their "favourite paper ever"