> When working in some space which is X*log(Probability)
> (x I guess 20 or 50) which is common one can get overflow
> or underflow issues...
X is 1000 for HMMER, SAM, DART & possibly others.
> Underflow => v. small probabilities, just truncate
> Overflow => keep an additional integers on the
> matrix which you have subtracted off all numbers
> when one number hits the limit.
Sounds right ballpark. I ran into another kind of overflow problem with
Handel when the scores get very big and negative for large trees (too big
to hold in 31 bits). Haven't heard of that overflow technique but it could
work. The technique of adding a constant score per column is the one e.g.
Anders was using.
> For log-sum functions, one tabulates the
> results of P_1 + P_2 (can save half the
> space cause you can choose the smaller one)
> and then linear interpolate between the results
> when it is not bang on.
you want to evaluate S = log(P_1+P_2) = log(exp(S_1) + exp(S_2))
where S_1 and S_2 are log likelihood scores
so... S = log(exp(S_1) + exp(S_2))
= log(exp(S_1) * (1 + exp(S_2 - S_1)))
= S_1 + log (1 + exp(S_2 - S_1))
= S_1 + f(S_2-S_1)
where f(x) = log(1+exp(x)). You choose things so that x is negative. If x
is very hugely negative (e.g. -10000000) then f(x) ~= 0, so you only need
to tabulate up to some maximum (negative) x.
, 7 March 2005