click on the Biowiki logo to go to homepage
Edit Raw Print
Links Diffs RSS
About Stats Recent


Research Teaching Blog
Fall09 | Sandbox
Biowiki > Teaching > BioE241Projects

Search

Advanced search...

Topics

PageRank Checker

Programming projects for BioE241.

Reference implementation code is C++.

Up to three exercises from the following list can be submitted.

The first three examples relate to the following alignment of Drosophila genomic DNA sequences. (Note that each row of the alignment is split over several lines, so as to fit in the width of the page.)

DroMel_CAF1 GTTTTTTCCTAATTATGAACATGGGAACCTTGGTGCCATCCTTGCTCTTGTAGAATATCTGCTCAACG
DroSim_CAF1 GTTTTTTCCTAATTATGAACATGGGAACCTTGGTGCCATCCTTGCTCTTGTAGAAGATCTGCTCAACG
DroSec_CAF1 GTTTTTTCCTAATTATGAACATGGGAACCTTGGTGCCATCCTTGCTCTTGTAGAATATCTGCTCAACG
DroYak_CAF1 GTTTTTTTCTAATTATGAACATCGGAACCTTGGTGCCATCCTTGCTCTTGTAGAATATCTGCTCAACG
DroEre_CAF1 GTTTTTTTCTAATTATGAACATCGGAACCTTGGTGCCATCCTTGCTCTTGTAGAATATCTGCTCAACG
DroAna_CAF1 TTTTTTTTGCGATAATAAACATGGGCACTTTGGTTCCATCCTTACTGCTGTAGAAAATTTGTTCTACT
DroPer_CAF1 GTTTCATACGAATTATAAACATTGGTATCTTAGTACCGTCTTTACTTTTATAAAATATCTGTTCGACA
DroWil_CAF1 TCTTCTTACGAATTATAAACATAGGAATTTTAGTACCATCCTTACTGGTATAAAATATTTGTTCAACT
DroMoj_CAF1 CCTTCTTTCGTATAATAAACATTGGCACCTTGGTGCCATCTTTGCTTGAATAAAATATTTGCTCAACG
DroVir_CAF1 GTTTCTTCTGTATTATAAACATTGGCACCTTGGTACCATCTTTGCTAGAATAAAAGATTTGCTCAACG
DroGri_CAF1 TGTTTTTCCGTATTATAAACATCGGCACCTTGGTACCATCTTTGCTGGAATAAAAGATTTGTTCGACG

DroMel_CAF1 GCGTAATCCTCGCGACGGAATCCTTCCAAATTCAGTTTGATTTCGCGGAATACTGATGGCGACTTGTC
DroSim_CAF1 GCGTAGTCCTCGCGACGGAATCCTTCCAAGTTCAGTTTGATTTCGCGGAATACTGAAGGCGACTTGTC
DroSec_CAF1 GCGTAGTCCTCGCGACGGAATCCTTCCAAGTTCAGTTTGATTTCGCGGAATACTGAAGGCGACTTCTC
DroYak_CAF1 GCGTAGTCCTCGCGACGGAATCCTTCTAAATTCAGTTTAATTTCGCGGAATACTGAAGGCGACTGGTC
DroEre_CAF1 GCGTAGTCCTCGCGACGGAATCCTTCTAAATTCAGTTTAATTTCGCGGAATACTGAAGGCGACTGGTC
DroAna_CAF1 GCATAGTCCTCTTGTCGGAATCCTTCTAGGTTCAGCTTTATTTCGCGGAAAATTTTAGGCTTCTGATC
DroPer_CAF1 ATGTAGTTTTCTCGACAGAAACCTTCCAAGTTGAGTTTGATTTCGCGAAACACTTTTGGCATGATCTC
DroWil_CAF1 TTATAGTCTTCTCGATTAAATCCATTTAAGTTTATTTTTATTTCTCGAAATACATCTGGCTGACGATC
DroMoj_CAF1 ATGTAATTATCCCGATTGAAACCATCCAAGTTCAGATTGATGTCGCGAAAGACTTTTGGTGGCTGATC
DroVir_CAF1 GCATAATTATCCTGTCGAAATCCGTCCAAATTGAGCTTAATTTCCCGAAAGACTTTTGGTGGCTGGTC
DroGri_CAF1 GCATAGTCTTCACGTCGGAATCCGTCCAAATTGAGTTTGATTTCCCGAAACACTTTTGGTGGCTCATC

DroMel_CAF1 AGGTGTCTTGAAATCATATCGATAAATGGAGCCGGGATTCAAAAAGGACGAGAAGTTGTAGAATATTT
DroSim_CAF1 AGGTGTCTTGAAATCATATCGATAAATGGAGCCGGGATTCAAAAAGGACGAGAAGTTGTAGAATATTT
DroSec_CAF1 AGGTGTCTTGAAATCGTATCGATAAATGGAACCGGGATTCAAAAAGGACGAGAAGTTGTAGAATATTT
DroYak_CAF1 AGGTGTCTTAAAATCATAACGATAAATAGAGCCGGGGTTCAAAAAGGACGAAAAGTTGTAGAATATTT
DroEre_CAF1 AGGTGTCTTGAAATCATATCGATAAATGGAGCCGGGGTTCAAAAAGGACGAGAAGTTGTAGAATATTT
DroAna_CAF1 GGGCGTCTTAAAATCGTATTGGTAAATTGTGCCAGGATTCAGGAACGAAGAATAGTTGTAGAAAATTT
DroPer_CAF1 TGGGGTCTTAAAATCATATTGATAAATACATCCAGGACTCAAAAAGGAAGAAAAGTTATAAAAAATCT
DroWil_CAF1 AGGGTTATTAAAGTCGTAATGGTAAATTGTTCCCGGGCTCAAAAATGAGGCAAAATTGTAAAAAAATT
DroMoj_CAF1 CGGATATTTAAAGTCATAATGATAAATGGTTCCTGGATTTAAAAACGATGAAAAATTATAGAAAATTT
DroVir_CAF1 TGGATATTTAAAATCATAGTGGTAAATTGTTCCGGGGTTTAAGAATGACGCGAAATTGTAAAAAATTT
DroGri_CAF1 TGGATGCTTAAAGTCATAATGATAAATTGTGCCGGGGTTCAAGAACGACGAGAAATTGTAGAAAATTT

DroMel_CAF1 CAGAATATTTCTTCTCTCCCGATGTGCCTACAATGGTGCCGATATCCAAGTCGAATTCGCGCAGTAGG
DroSim_CAF1 CGGAGTATTTCTTCTCCCCCGATGTGCCCACAATGGTGCCGATATCCAAGTCGAATTCGCGCAGTAGT
DroSec_CAF1 CGGAGTATTTCTTCTCCCCCGATGTGCCCACAATGGTGCCGATATCCAAGTCGAATTCGCGCAGTAGT
DroYak_CAF1 CGGAATATTTCTTCTCTCCAGAGGTTCCCACAATGGTGCCGATATCCAAGTCGAATTCACGCAACAGG
DroEre_CAF1 CGGAATATTTCTTCTCTCCAGAGGTTCCCACAATGGTGCCGATATCCAACTTGAATTCGCGCAGCAGG
DroAna_CAF1 CCGAATAACGCTTCTCTCCCGTTGTCCCCACAATGGTGCCAATATCCAAATCGAATTCTCGCACCAGC
DroPer_CAF1 CTGAATACTTCTTCTCGCCTGAAGTTCCTACAATGGTACCGATATCTAAGTCGAATACACGCAATAGT
DroWil_CAF1 CAGAATGCTTTTTTTTTCCAGAAGTTCCAACGATGGTTCCAATATCCAAGTCAAATGAGCAAAATAGC
DroMoj_CAF1 CCGAATATTTAGTTTTTCCTGATGTTCCGACAATGGTGCCAATATCTAAATCAAATTGTCGTATCAAT
DroVir_CAF1 CGGAATATTTTGTTTTGCCAGAGGTCCCAACAATGGTACCAATATCCAAATCAAATTGTCGTATTAGC
DroGri_CAF1 CGGAATATTTCTTTTTCCCAGACGTGCCAACAATGGTGCCAATATCCAAATCAAATTGTCGAATCAGT

DroMel_CAF1 GTGCCATCCTTTAGGGAATTCACTTGCA
DroSim_CAF1 GTGCCATCCTTTAGGGAATTCACTTGCA
DroSec_CAF1 GTGCCATCCTTTAGGGAATTAACTTGCA
DroYak_CAF1 GTTCCATCCTTCAGAGAATTCACTTGCA
DroEre_CAF1 GTACCATCCTTTAGGGAGTTCACTTGTA
DroAna_CAF1 TTGCCATCATTCAATGAATGTACTTGTA
DroPer_CAF1 TTTCCAGTTCGAAGACAATTGGCTTGCA
DroWil_CAF1 TCCCCCGTTTGAAGTAAGTTAGCTTGCA
DroMoj_CAF1 TCTCCAGTTCGAAGTGAATTCGCTTGTA
DroVir_CAF1 GCTCCGGTTTGAAGTGAATTAGCTTGCA
DroGri_CAF1 TTTCCAGTTTCTAACGAATTTGCTTGCA

Matrix exponentiation

For this exercise, consider only the Drosophila persimilis and Drosophila willestoni sequences from the above alignment (DroPer_CAF1 and DroWil_CAF1). Denote these sequence data by D.

Assume a nucleotide substitution model of the following form (similar to the HKY85 model). The nucleotide ordering is (A,C,G,T).

{\bf R} = \left( \begin{array}{llll} - & f_C & \kappa f_G & f_T \\ f_A & - & f_G & \kappa f_T \\ \kappa f_A & f_C & - & f_T \\ f_A & \kappa f_C & f_G & - \end{array} \right)

Here {\bf f} = \{f_A,f_C,f_G,f_T\} are the nucleotide frequencies and \kappa is the transition/transversion ratio.

Assuming \kappa=4 and appropriate values for the f_i, generate a plot of log-likelihood, \log P(D|{\bf f},\kappa,T), as a function of the evolutionary separation time T of Drosophila persimilis and Drosophila willestoni.

Using Newton-Raphson optimization, or otherwise, estimate maximum likelihood values for \kappa and T.

You may find the PDF file of the HKY85 paper to be useful.

Pruning & postorder tree traversal

For this exercise, consider only the D.ananassae, D.melanogaster, D.yakuba & D.erecta sequences from the above alignment (DroAna_CAF1, DroMel_CAF1, DroYak_CAF1 and DroEre_CAF1).

Assume an HKY85 substitution model of the form shown above, with \kappa=4,\pi_A=\pi_T=0.28,\pi_C=\pi_G=0.22.

Compute the values of the following two alternate trees (presented here in Newick Format):

  1. ((DroMel_CAF1:0.08,(DroYak_CAF1:0.08,DroEre_CAF1:0.04):0.04):0.33,DroAna_CAF1:0.47);
  2. (((DroMel_CAF1:0.13,DroEre_CAF1:0.04):0.02,DroYak_CAF1:0.04):0.34,DroAna_CAF1:0.49);

Assuming both trees have an equal a priori probability of 0.5, what is the posterior probability of the first tree (having observed the alignment data)?

Biological motivation for this problem may be found here...

Peeling algorithm; expected transitions & event counts

For this exercise, use the entire multiple alignment shown above, and assume the following tree (Newick Format):

((((((DroMel_CAF1:0.06185,
     (DroSim_CAF1:0.048265,
      DroSec_CAF1:0.048265):0.000001):0.057735,
    (DroYak_CAF1:0.09789,
     DroEre_CAF1:0.088775):0.03448):0.425435,
    DroAna_CAF1:0.590605):0.290085,
   DroPer_CAF1:0.535315):0.31104,
  DroWil_CAF1:0.5):0.000001,
 ((DroMoj_CAF1:0.476425,
   DroVir_CAF1:0.28919):0.13216,
  DroGri_CAF1:0.45018):0.30115);

Assume an HKY85 substitution model of the form shown above, with \kappa=4,\pi_A=\pi_T=0.28,\pi_C=\pi_G=0.22.

Compute the expectation (taken over the posterior distribution of event histories) of the total number of substitution events that occurred in the evolution of these DNA sequences.

Pairwise Forward-backward & Baum-Welch algorithms

Comparison of mutation parameters between at least two protein domains

(more to follow here)

Cocke-Younger-Kasami & Inside algorithms

Alternate stem predictions in pseudoknots

(more to follow here)

Actions: Edit | Attach | New | Ref-By | Printable view | Raw view | Normal view | See diffs | Help | More...