Teaching > BioE241
TWiki webs: Main | TWiki | Sandbox   Log In or Register

# BioE241

## Class info

• Title: Probabilistic Modeling in Computational Biology
• Instructor: Ian Holmes
• Class: BioE 241
• When: Spring semester, 2010
• Lectures: Tue/Thurs 12:30-2, 105 Latimer
• Labs: Mon 12-3, 1171 Etcheverry

## Syllabus

1. Introduction and overview. Simulations of evolving DNA and RNA sequences and structures. Definition of the ancestral sequence reconstruction problem and domain (genes, genomes, RNA structures, families/pathways, phenotypic quantities). Some related problems in bioinformatics: phylogeny, alignment, annotation. Tools we will use, theoretical (ODEs, PDEs, matrices, graphs) and computational (various software packages).
2. Continuous-time Markov chains. Finite & infinite state spaces, discrete & continuous times. Equilibrium, reversibility, matrix exponentiation. DNA substitution and indel models. Taylor series for matrix exponentiation.
3. Eigenspaces. Solution of the matrix exponential by eigenvalue/eigenvector decomposition. Basic matrix theorems (real matrices have real or complex conjugate evals/evecs; in symmetric matrices they are real, and the evecs can be chosen to be orthonormal). Introduce notation for Markov chains on trees (graphical models), with a simple example (one root + two leaves). Briefly describe some common point substitution models: Jukes-Cantor (1969), Kimura (1980), Felsenstein (1981), Hasegawa-Kishino-Yano (1985), REV/IRREV, Dayhoff (1979).
4. Pruning. Felsenstein pruning, likelihood calculation, Elston-Stewart peeling, posterior probability calculations; conceptualization as message-passing on factor graphs. Hardware acceleration (the Beagle library).
5. HMMs. Isochores; HMMs; the Forward, Viterbi, Backward & Baum-Welch algorithms.
6. EM. Definition of the Expectation Maximization algorithm; proof that it improves the likelihood at every step (via relative entropy, Jensen's inequality, Gibbs' inequality); the Neal-Hinton "variational free energy" view; and example applications (simple multinomial, mixture of Gaussians/K-means).
7. Conjugate priors: gamma/Normal, multinomial/Dirichlet, binomial/beta. Dirichlet mixtures.
8. Notable HMMs in bioinformatics.
9. Alignment benchmarks and simulations.
10. Phylo-HMMs and Pair-HMMs.
11. EM algorithm for continuous-time Markov chains.
12. MCMC. Basic ideas (movesets, ergodicity, reversibility, mixing, burn-in time), Metropolis-Hastings, Gibbs & reversible-jump approaches. The Lawrence, Liu et al sampler for ungapped motifs.
13. Simulation. Gillespie algorithm, Yule process, coalescent simulations. MCMC over tree topologies; common moves. Data compression; Kraft's inequality, arithmetic coding.
14. Structural EM. The algorithm of Friedman, Pupko et al applying structural EM to infer the topology of phylogenetic trees.
15. The TKF91 model. Evolutionary grammars, symmetry properties (specifically, reversibility & homomorphic maps), the point indel model, the separation of an alignment into independently evolving zones, the TKF91 formulation in terms of "mortal links" & "immortal links", the birth-death process equations for the zone fates (and the ubiquity of birth-death processes in general), the geometric form of the solutions for the zone fate probabilities, the mapping of these probabilities onto a Pair HMM, and three different ways of arriving at the solutions: trial solution, Metzler arguments, generating functions.
16. Sequence transducers. Formulation. Composition.
17. Statistical alignment: TKF91, Holmes-Bruno, Redelings-Suchard, other samplers.
18. Heuristic algorithms for indel reconstruction. Indelign, and analysis thereof. Ortheus.
19. Breakpoint graphs. Haussler et al's "infinite sites" model.
20. SCFGs. CYK. Inside-Outside. Pair SCFGs. Acceleration/approximation (i.e. the "constrained Sankoff" algorithm).
21. Evolution of RNA. The TKF Structure Tree model as a set of recursively nested birth-death (TKF91) processes.
22. Modeling multiple alignments of RNA and reconstructing ancient RNA. Tree transducers and composition algorithms. Analysis of grammars. Ortheus-like approaches.
23. Viral genomes, genefinding, evolution and statistical alignment.
24. Gene ontology; phylogenetic inference of gene annotations. SIFTER et al.
25. Gene family evolution. Arvestad et al. Hahn et al.
26. Fokker-Planck equation. A taste of continuous-state systems: the Kramers-Moyal expansion, the Fokker-Planck equation, and notable special cases (Liouville process, Brownian motion, Ornstein-Uhlenbeck process, continuous approximation to birth-death process). Application: Samoilov, Arkin et al's treatment of how noisy forward-enzyme concentrations perturb the Michaelis-Menten kinetics of the enzyme futile cycle.

## Class materials

All class materials (lecture slides and lab handouts) are available at github: https://github.com/ihh/bioe241

## Recommended textbooks

Primary texts (probabilistic modeling and bioinformatics)

• The MacKay Book. MacKay. Information Theory, Inference and Learning Algorithms. ISBN:0521642981
• Can be downloaded from here (copyright grants permission to view but not print)

Background on molecular evolution & algorithms for doing it:

For continuous-valued Markov processes, Gaussian & otherwise:

• Rasmussen and Williams. Gaussian Processes for Machine Learning. ISBN:026218253X

Computational neuroscience:

• The SpikesBook. Rieke, Warland, de Ruyter van Steveninck and Bialek. Spikes: Exploring the Neural Code. ISBN:0262681080
• An excellent introduction to information theory & Bayesian analysis in computational neuroscience.

• Eliasmith and Anderson. Neural Engineering: Computation, Representation, and Dynamics in Neurobiological Systems. ISBN:0262050714

• That Stanford ribo-happening video
I Attachment Action Size Date Who Comment
EXT tree manage 0.1 K 2006-10-03 - 19:34 IanHolmes Example tree for hw2
EXT alignment manage 0.2 K 2006-10-03 - 19:46 IanHolmes Example alignment for hw2
EXT tree2 manage 0.2 K 2006-10-11 - 18:38 IanHolmes Actual tree for hw2
EXT alignment2 manage 0.8 K 2006-10-11 - 18:39 IanHolmes Actual alignment for hw2
EXT stockholm manage 1.0 K 2006-10-25 - 02:51 IanHolmes Combined alignment2+tree2

Parents: WebHome
Copyright © 2008-2013 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
TWiki Appliance - Powered by TurnKey Linux