24 Nov 2011

Homework #8a - Bayesian Analysis of DNA sequence Origin

• ORFprob.pl: ORF Bayesian Perl script - outputs posterior probability on sequence origin

ORFprob.pl Features:

• Reads FASTA formatted sequence file as input
• Computes the probability distribution, P(x), for each nucleic acid "x" in sequence "S".
• Outputs the posterior probability P(G=1|S) and the log ratio of P(G=1|S) / P(G=0|S)

Examples:

• perl ORFprob.pl file.fasta        # Outputs the posterior probability and log ratio of posterior probabilities
• perl ORFprob.pl file.fasta -h        # Prints program information
• perl ORFprob.pl file.fasta -move [#]        # Option to shift the reading frame (e.g. -move 1 will shift reading frame by 1 nucleotide)
• perl ORFprob.pl file.fasta -stop [X]        # Option to enter unique STOP codon (e.g. TGG)

Computation:

Background on sequence origin probabilities P(G=0) and P(G=1) can be found here.

This program first computes the posterior probability P(G=0|S) from the probability P(S|G=0). The prior probability P(S|G=0) is computed using the probability P(G=0) would not generate a STOP codon for the given sequence codon-length and reading frame (this also depends on P(x=T), P(x=A), and P(x=G)). From this, P(G=0|s) is computed by:

The posterior probability P(G=1|S) is then:

Terminal Example of ORFprob.pl:

For test.fasta file with sequences:

