Home - this site is powered by TWiki(R)
Teaching > BioE131 > DNAInfoContentHW
TWiki webs: Main | TWiki | Sandbox   Log In or Register

Homework: Information Content of DNA

This homework is derived from the lab you did on InformationContentOfDNA. In this case we will be studying a cosmid of the Mycobacterium leprae genome, which has been downloaded to ~be131/Teaching.InformationContentOfDNA/mleprae.fasta. Use the methods of the lab to answer the following questions:

  1. Sketch the dotplots for direct repeats, inverted repeats and regions of high or low sequence complexity.
  2. What is the nucleotide composition of the Mycobacterium cosmid? What is the entropy of this distribution? What is the dinucleotide composition?
  3. Use a sliding-window entropy program to scan across various parts of the Mycobacterium cosmid, including microsatellite and tandem repeat regions. In particular, try here. Visualize the results by piping them into xgraph. Try playing with the -n and -w parameters to change the word length and window size (respectively). Compare the results to a dotplot. What sorts of repeat is the sliding-window entropy method good at picking up, and what does it miss?
  4. This journal article discusses a TTC repeat in the M. leprae genome. Does the sliding-window entropy method identify it in the cosmid?
  5. Try compressing the Mycobacterium cosmid using a standard data compression tool, for example gzip or bzip2. What's the result?

-- YuriBendana - 17 Oct 2006

Edit | Attach | Print version | History: r18 < r17 < r16 < r15 < r14 | Backlinks | Raw View | Raw edit | More topic actions


Parents: BioE131
This site is powered by the TWiki collaboration platformCopyright © 2008-2014 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
TWiki Appliance - Powered by TurnKey Linux