Try to answer the following questions:
ATGGCTA
. Roughly how frequently (giving an answer of the form "every N bases", and explaining your reasoning) would you expect to see this motif occurring in (a) a uniform IID DNA sequence, (b) an IID DNA sequence with GC content 60%? How well might you expect a naturally occurring genomic sequence to conform to these models, and in what ways would it deviate from the models? How would your answers change if the motif was the 8mer ATATATAT
instead of the 7mer ATGGCTA
?
gzip
program to empirically estimate the relative entropy D(PQ) where P is an IID model for human genomic DNA, and Q is the implicit probability distribution underlying the LempelZiv algorithm?
