|
|
Programming projects for BioE241.
Reference implementation code is C++.
Up to three exercises from the following list can be submitted.
The first three examples relate to the following alignment of Drosophila genomic DNA sequences.
(Note that each row of the alignment is split over several lines, so as to fit in the width of the page.)
DroMel_CAF1 GTTTTTTCCTAATTATGAACATGGGAACCTTGGTGCCATCCTTGCTCTTGTAGAATATCTGCTCAACG
DroSim_CAF1 GTTTTTTCCTAATTATGAACATGGGAACCTTGGTGCCATCCTTGCTCTTGTAGAAGATCTGCTCAACG
DroSec_CAF1 GTTTTTTCCTAATTATGAACATGGGAACCTTGGTGCCATCCTTGCTCTTGTAGAATATCTGCTCAACG
DroYak_CAF1 GTTTTTTTCTAATTATGAACATCGGAACCTTGGTGCCATCCTTGCTCTTGTAGAATATCTGCTCAACG
DroEre_CAF1 GTTTTTTTCTAATTATGAACATCGGAACCTTGGTGCCATCCTTGCTCTTGTAGAATATCTGCTCAACG
DroAna_CAF1 TTTTTTTTGCGATAATAAACATGGGCACTTTGGTTCCATCCTTACTGCTGTAGAAAATTTGTTCTACT
DroPer_CAF1 GTTTCATACGAATTATAAACATTGGTATCTTAGTACCGTCTTTACTTTTATAAAATATCTGTTCGACA
DroWil_CAF1 TCTTCTTACGAATTATAAACATAGGAATTTTAGTACCATCCTTACTGGTATAAAATATTTGTTCAACT
DroMoj_CAF1 CCTTCTTTCGTATAATAAACATTGGCACCTTGGTGCCATCTTTGCTTGAATAAAATATTTGCTCAACG
DroVir_CAF1 GTTTCTTCTGTATTATAAACATTGGCACCTTGGTACCATCTTTGCTAGAATAAAAGATTTGCTCAACG
DroGri_CAF1 TGTTTTTCCGTATTATAAACATCGGCACCTTGGTACCATCTTTGCTGGAATAAAAGATTTGTTCGACG
DroMel_CAF1 GCGTAATCCTCGCGACGGAATCCTTCCAAATTCAGTTTGATTTCGCGGAATACTGATGGCGACTTGTC
DroSim_CAF1 GCGTAGTCCTCGCGACGGAATCCTTCCAAGTTCAGTTTGATTTCGCGGAATACTGAAGGCGACTTGTC
DroSec_CAF1 GCGTAGTCCTCGCGACGGAATCCTTCCAAGTTCAGTTTGATTTCGCGGAATACTGAAGGCGACTTCTC
DroYak_CAF1 GCGTAGTCCTCGCGACGGAATCCTTCTAAATTCAGTTTAATTTCGCGGAATACTGAAGGCGACTGGTC
DroEre_CAF1 GCGTAGTCCTCGCGACGGAATCCTTCTAAATTCAGTTTAATTTCGCGGAATACTGAAGGCGACTGGTC
DroAna_CAF1 GCATAGTCCTCTTGTCGGAATCCTTCTAGGTTCAGCTTTATTTCGCGGAAAATTTTAGGCTTCTGATC
DroPer_CAF1 ATGTAGTTTTCTCGACAGAAACCTTCCAAGTTGAGTTTGATTTCGCGAAACACTTTTGGCATGATCTC
DroWil_CAF1 TTATAGTCTTCTCGATTAAATCCATTTAAGTTTATTTTTATTTCTCGAAATACATCTGGCTGACGATC
DroMoj_CAF1 ATGTAATTATCCCGATTGAAACCATCCAAGTTCAGATTGATGTCGCGAAAGACTTTTGGTGGCTGATC
DroVir_CAF1 GCATAATTATCCTGTCGAAATCCGTCCAAATTGAGCTTAATTTCCCGAAAGACTTTTGGTGGCTGGTC
DroGri_CAF1 GCATAGTCTTCACGTCGGAATCCGTCCAAATTGAGTTTGATTTCCCGAAACACTTTTGGTGGCTCATC
DroMel_CAF1 AGGTGTCTTGAAATCATATCGATAAATGGAGCCGGGATTCAAAAAGGACGAGAAGTTGTAGAATATTT
DroSim_CAF1 AGGTGTCTTGAAATCATATCGATAAATGGAGCCGGGATTCAAAAAGGACGAGAAGTTGTAGAATATTT
DroSec_CAF1 AGGTGTCTTGAAATCGTATCGATAAATGGAACCGGGATTCAAAAAGGACGAGAAGTTGTAGAATATTT
DroYak_CAF1 AGGTGTCTTAAAATCATAACGATAAATAGAGCCGGGGTTCAAAAAGGACGAAAAGTTGTAGAATATTT
DroEre_CAF1 AGGTGTCTTGAAATCATATCGATAAATGGAGCCGGGGTTCAAAAAGGACGAGAAGTTGTAGAATATTT
DroAna_CAF1 GGGCGTCTTAAAATCGTATTGGTAAATTGTGCCAGGATTCAGGAACGAAGAATAGTTGTAGAAAATTT
DroPer_CAF1 TGGGGTCTTAAAATCATATTGATAAATACATCCAGGACTCAAAAAGGAAGAAAAGTTATAAAAAATCT
DroWil_CAF1 AGGGTTATTAAAGTCGTAATGGTAAATTGTTCCCGGGCTCAAAAATGAGGCAAAATTGTAAAAAAATT
DroMoj_CAF1 CGGATATTTAAAGTCATAATGATAAATGGTTCCTGGATTTAAAAACGATGAAAAATTATAGAAAATTT
DroVir_CAF1 TGGATATTTAAAATCATAGTGGTAAATTGTTCCGGGGTTTAAGAATGACGCGAAATTGTAAAAAATTT
DroGri_CAF1 TGGATGCTTAAAGTCATAATGATAAATTGTGCCGGGGTTCAAGAACGACGAGAAATTGTAGAAAATTT
DroMel_CAF1 CAGAATATTTCTTCTCTCCCGATGTGCCTACAATGGTGCCGATATCCAAGTCGAATTCGCGCAGTAGG
DroSim_CAF1 CGGAGTATTTCTTCTCCCCCGATGTGCCCACAATGGTGCCGATATCCAAGTCGAATTCGCGCAGTAGT
DroSec_CAF1 CGGAGTATTTCTTCTCCCCCGATGTGCCCACAATGGTGCCGATATCCAAGTCGAATTCGCGCAGTAGT
DroYak_CAF1 CGGAATATTTCTTCTCTCCAGAGGTTCCCACAATGGTGCCGATATCCAAGTCGAATTCACGCAACAGG
DroEre_CAF1 CGGAATATTTCTTCTCTCCAGAGGTTCCCACAATGGTGCCGATATCCAACTTGAATTCGCGCAGCAGG
DroAna_CAF1 CCGAATAACGCTTCTCTCCCGTTGTCCCCACAATGGTGCCAATATCCAAATCGAATTCTCGCACCAGC
DroPer_CAF1 CTGAATACTTCTTCTCGCCTGAAGTTCCTACAATGGTACCGATATCTAAGTCGAATACACGCAATAGT
DroWil_CAF1 CAGAATGCTTTTTTTTTCCAGAAGTTCCAACGATGGTTCCAATATCCAAGTCAAATGAGCAAAATAGC
DroMoj_CAF1 CCGAATATTTAGTTTTTCCTGATGTTCCGACAATGGTGCCAATATCTAAATCAAATTGTCGTATCAAT
DroVir_CAF1 CGGAATATTTTGTTTTGCCAGAGGTCCCAACAATGGTACCAATATCCAAATCAAATTGTCGTATTAGC
DroGri_CAF1 CGGAATATTTCTTTTTCCCAGACGTGCCAACAATGGTGCCAATATCCAAATCAAATTGTCGAATCAGT
DroMel_CAF1 GTGCCATCCTTTAGGGAATTCACTTGCA
DroSim_CAF1 GTGCCATCCTTTAGGGAATTCACTTGCA
DroSec_CAF1 GTGCCATCCTTTAGGGAATTAACTTGCA
DroYak_CAF1 GTTCCATCCTTCAGAGAATTCACTTGCA
DroEre_CAF1 GTACCATCCTTTAGGGAGTTCACTTGTA
DroAna_CAF1 TTGCCATCATTCAATGAATGTACTTGTA
DroPer_CAF1 TTTCCAGTTCGAAGACAATTGGCTTGCA
DroWil_CAF1 TCCCCCGTTTGAAGTAAGTTAGCTTGCA
DroMoj_CAF1 TCTCCAGTTCGAAGTGAATTCGCTTGTA
DroVir_CAF1 GCTCCGGTTTGAAGTGAATTAGCTTGCA
DroGri_CAF1 TTTCCAGTTTCTAACGAATTTGCTTGCA
Matrix exponentiation
For this exercise, consider only the Drosophila persimilis and Drosophila willestoni sequences from the above alignment
(DroPer_CAF1 and DroWil_CAF1).
Denote these sequence data by .
Assume a nucleotide substitution model of the following form (similar to the HKY85 model). The nucleotide ordering is (A,C,G,T).
Here are the nucleotide frequencies and is the transition/transversion ratio.
Assuming and appropriate values for the , generate a plot of log-likelihood, ,
as a function of the evolutionary separation time of Drosophila persimilis and Drosophila willestoni.
Using Newton-Raphson optimization, or otherwise,
estimate maximum likelihood values for and .
You may find the PDF file of the HKY85 paper to be useful.
Pruning & postorder tree traversal
For this exercise, consider only the D.ananassae, D.melanogaster, D.yakuba & D.erecta sequences from the above alignment
(DroAna_CAF1, DroMel_CAF1, DroYak_CAF1 and DroEre_CAF1).
Assume an HKY85 substitution model of the form shown above,
with .
Compute the values of the following two alternate trees (presented here in Newick Format):
-
((DroMel_CAF1:0.08,(DroYak_CAF1:0.08,DroEre_CAF1:0.04):0.04):0.33,DroAna_CAF1:0.47);
-
(((DroMel_CAF1:0.13,DroEre_CAF1:0.04):0.02,DroYak_CAF1:0.04):0.34,DroAna_CAF1:0.49);
Assuming both trees have an equal a priori probability of 0.5, what is the posterior probability of the first tree (having observed the alignment data)?
Biological motivation for this problem may be found here...
Peeling algorithm; expected transitions & event counts
For this exercise, use the entire multiple alignment shown above, and assume the following tree (Newick Format):
((((((DroMel_CAF1:0.06185,
(DroSim_CAF1:0.048265,
DroSec_CAF1:0.048265):0.000001):0.057735,
(DroYak_CAF1:0.09789,
DroEre_CAF1:0.088775):0.03448):0.425435,
DroAna_CAF1:0.590605):0.290085,
DroPer_CAF1:0.535315):0.31104,
DroWil_CAF1:0.5):0.000001,
((DroMoj_CAF1:0.476425,
DroVir_CAF1:0.28919):0.13216,
DroGri_CAF1:0.45018):0.30115);
Assume an HKY85 substitution model of the form shown above,
with .
Compute the expectation (taken over the posterior distribution of event histories)
of the total number of substitution events that occurred in the evolution of these DNA sequences.
Pairwise Forward-backward & Baum-Welch algorithms
Comparison of mutation parameters between at least two protein domains
(more to follow here)
Cocke-Younger-Kasami & Inside algorithms
Alternate stem predictions in pseudoknots
(more to follow here)
|