click on the Biowiki logo to go to homepage
Edit Raw Print
Links Diffs RSS
About Stats Recent


Research Teaching Blog
Fall09 | Sandbox
Biowiki > Teaching > BioE241

Search

Advanced search...

Topics

PageRank Checker

Bio E 241

Class info

  • Title: Probabilistic Modeling, Genomics and Jurassic Park
  • Instructor: IanHolmes
  • Class: BioE 241
  • When: Spring semester, 2010
    • Lectures: Tue/Thurs 3-5:30, 115 Kroeber
  • Instructor blog -- please do post comments/questions.

In this class you will use statistics, DNA sequencing data and/or paleontology to reconstruct the genomes of organisms that were ancestral to multiple present-day species. Aspects of reconstruction that will be considered include inference algorithms; confidence estimates; detail at multiple levels (DNA sequence, genes, pathways, systems); theory and in silico experiments to investigate the accuracy of sequence reconstruction algorithms; and experimental approaches to synthesis and empirical investigation of reconstructed sequence.

This page is currently under review. Here is some older archived material: Bio E 241 Fall 07.

Syllabus

This is the syllabus, more or less.

  1. Introduction and overview. Simulations of evolving DNA and RNA sequences and structures. Definition of the ancestral sequence reconstruction problem and domain (genes, genomes, RNA structures, families/pathways, phenotypic quantities). Some related problems in bioinformatics: phylogeny, alignment, annotation. Tools we will use, theoretical (ODEs, PDEs, matrices, graphs) and computational (various software packages).
  2. Continuous-time Markov chains. Finite & infinite state spaces, discrete & continuous times. Equilibrium, reversibility, matrix exponentiation. DNA substitution and indel models. Taylor series for matrix exponentiation.
  3. Eigenspaces. Solution of the matrix exponential by eigenvalue/eigenvector decomposition. Basic matrix theorems (real matrices have real or complex conjugate evals/evecs; in symmetric matrices they are real, and the evecs can be chosen to be orthonormal). Introduce notation for Markov chains on trees (graphical models), with a simple example (one root + two leaves). Briefly describe some common point substitution models: Jukes-Cantor (1969), Kimura (1980), Felsenstein (1981), Hasegawa-Kishino-Yano (1985), REV/IRREV, Dayhoff (1979).
  4. Pruning. Felsenstein pruning, likelihood calculation, Elston-Stewart peeling, posterior probability calculations; conceptualization as message-passing on factor graphs. Hardware acceleration (the Beagle library).
  5. HMMs. Isochores; HMMs; the Forward, Viterbi, Backward & Baum-Welch algorithms.
  6. EM. Definition of the Expectation Maximization algorithm; proof that it improves the likelihood at every step (via relative entropy, Jensen's inequality, Gibbs' inequality); the Neal-Hinton "variational free energy" view; and example applications (simple multinomial, mixture of Gaussians/K-means).
  7. Conjugate priors: gamma/Normal, multinomial/Dirichlet, binomial/beta. Dirichlet mixtures.
  8. Notable HMMs in bioinformatics.
  9. Alignment benchmarks and simulations.
  10. Phylo-HMMs and Pair-HMMs.
  11. EM algorithm for continuous-time Markov chains.
  12. MCMC. Basic ideas (movesets, ergodicity, reversibility, mixing, burn-in time), Metropolis-Hastings, Gibbs & reversible-jump approaches. The Lawrence, Liu et al sampler for ungapped motifs.
  13. Simulation. Gillespie algorithm, Yule process, coalescent simulations. MCMC over tree topologies; common moves. Data compression; Kraft's inequality, arithmetic coding.
  14. Structural EM. The algorithm of Friedman, Pupko et al applying structural EM to infer the topology of phylogenetic trees.
  15. The TKF91 model. Evolutionary grammars, symmetry properties (specifically, reversibility & homomorphic maps), the point indel model, the separation of an alignment into independently evolving zones, the TKF91 formulation in terms of "mortal links" & "immortal links", the birth-death process equations for the zone fates (and the ubiquity of birth-death processes in general), the geometric form of the solutions for the zone fate probabilities, the mapping of these probabilities onto a Pair HMM, and three different ways of arriving at the solutions: trial solution, Metzler arguments, generating functions.
  16. Sequence transducers. Formulation. Composition.
  17. Statistical alignment: TKF91, Holmes-Bruno, Redelings-Suchard, other samplers.
  18. Heuristic algorithms for indel reconstruction. Indelign, and analysis thereof. Ortheus.
  19. Breakpoint graphs. Haussler et al's "infinite sites" model.
  20. SCFGs. CYK. Inside-Outside. Pair SCFGs. Acceleration/approximation (i.e. the "constrained Sankoff" algorithm).
  21. Evolution of RNA. The TKF Structure Tree model as a set of recursively nested birth-death (TKF91) processes.
  22. Modeling multiple alignments of RNA and reconstructing ancient RNA. Tree transducers and composition algorithms. Analysis of grammars. Ortheus-like approaches.
  23. Viral genomes, genefinding, evolution and statistical alignment.
  24. Gene ontology; phylogenetic inference of gene annotations. SIFTER et al.
  25. Gene family evolution. Arvestad et al. Hahn et al.
  26. Fokker-Planck equation. A taste of continuous-state systems: the Kramers-Moyal expansion, the Fokker-Planck equation, and notable special cases (Liouville process, Brownian motion, Ornstein-Uhlenbeck process, continuous approximation to birth-death process). Application: Samoilov, Arkin et al's treatment of how noisy forward-enzyme concentrations perturb the Michaelis-Menten kinetics of the enzyme futile cycle.

Labs

Paper review sessions

(out of date: BioE241Presentations page.)

Programming exercises

(out of date: BioE241Projects page.)

Lecture notes

Other materials

  • That Stanford ribo-happening video

Recommended textbooks

Primary texts (probabilistic modeling and bioinformatics)

  • The MacKay Book. MacKay. Information Theory, Inference and Learning Algorithms. ISBN:0521642981
    • Can be downloaded from here (copyright grants permission to view but not print)

Background on molecular evolution & algorithms for doing it:

For continuous-valued Markov processes, Gaussian & otherwise:

  • Rasmussen and Williams. Gaussian Processes for Machine Learning. ISBN:026218253X

Computational neuroscience:

  • The Spikes Book. Rieke, Warland, de Ruyter van Steveninck and Bialek. Spikes: Exploring the Neural Code. ISBN:0262681080
    • An excellent introduction to information theory & Bayesian analysis in computational neuroscience.

  • Eliasmith and Anderson. Neural Engineering: Computation, Representation, and Dynamics in Neurobiological Systems. ISBN:0262050714

Attachment sort Action Size Date Who Comment
alignment manage 0.2 K 03 Oct 2006 - 12:46 IanHolmes Example alignment for hw2
tree manage 0.1 K 03 Oct 2006 - 12:34 IanHolmes Example tree for hw2
tree2 manage 0.2 K 11 Oct 2006 - 11:38 IanHolmes Actual tree for hw2
alignment2 manage 0.8 K 11 Oct 2006 - 11:39 IanHolmes Actual alignment for hw2
stockholm manage 1.0 K 24 Oct 2006 - 19:51 IanHolmes Combined alignment2+tree2

Actions: Edit | Attach | New | Ref-By | Printable view | Raw view | Normal view | See diffs | Help | More...