Long Indel Model
The "Long Indel Model" introduces realistic indel length distributions to the TKF model.
- Miklós et al.: A "Long Indel" model for evolutionary sequence alignment. Mol. Biol. Evol. 2004;21:529-40. (pdf)
It does so at the expense of expanding the finite state transducer into a conditional generalized Pair HMM with arbitrary length distributions for indels, which effectively has a number of states bounded by the product of the sequence lengths.
Finite-state transducer approximations to this model exist (Gotoh Transducer, Knudsen Miyamoto Transducer). A related approximation is the [TKF model TKF92 model] which breaks a sequence into indivisible fragments. (The transducer approximations assume that the branch length is short enough to neglect overlapping indels, whereas the TKF92 model assumes that overlapping indels never occur.)
In fact, it is known that any "concave" gap penalty (i.e. monotonically decreasing gap length distribution) can be well-approximated by a finite state machine, at least for Viterbi alignment:
- Miller & Myers: Sequence comparison with concave weighting functions. Bull. Math. Biol. 1988;50:97-120.
-- Ian Holmes - 23 Apr 2008
- cited by at least one person as their "favourite paper ever"