Biowiki . Teaching . OldGradClassSyllabus
Biowiki . Teaching . OldGradClassSyllabus
Lecture topics
probabilistic modeling for biological research
molecular evolution and phylogenetics algorithms
stochastic grammars, including hidden Markov models as a special case
analysis of algorithms
Approximate syllabus (subject to change)
Computational basics: state-of-the-art workstations and webservers on a shoestring budget
Probability basics
review of probability theory (events, independence, conditional/joint probabilities, expectations, variances, probability density functions for continuous random variables)
Basic combinatorics (power sets, binomial/multinomial coefficients)
Bayes' theorem: prior and posterior probabilities
Examples involving composition of DNA and protein sequences
common probability distributions: binomial/multinomial, geometric, Poisson, uniform, Gaussian
Maximum likelihood parameterisation
Quick overview of Markov chains
Quick overview of Expectation Maximization
Example of EM: mixture models and k-means clustering
Use of k-means in microarray analysis and systems biology
An introduction to stochastic grammars for sequence analysis
The Chomsky hierarchy of grammars
Deterministic, nondeterministic, stochastic grammars
Overview of parsing and training algorithms
Overview of applications: in biology (alignment, annotation), natural language processing, military/industrial, etc
Information theory as applied to biological sequence analysis
Shannon information, entropy
Data compression: non-probabilistic (Lempel-Ziv, Huffman, etc) and probabilistic (the arithmetic coding algorithm)
Discrete Markov chains: simple probabilistic models for DNA, RNA and protein sequences
Shannon's Markov models for English text; equivalents for DNA, RNA and protein sequences
Hidden Markov Models (HMMs): the Viterbi, Forward, Forward-Backward and Baum-Welch algorithms
Chomsky revisited: HMMs as left-regular stochastic grammars
Applications of single-sequence HMMs to genome annotation
DNA: CpG islands, profiles of protein binding sites, genefinding
Proteins: hydrophobic/polar models for globular proteins; transmembrane proteins. Protein domains, homology searches and profile HMMs
Advanced topics in probability theory for Bayesian modeling
Conjugate probability distributions and applications in modeling
Normal (and Normal/gamma) and k-means
Multinomial (and Dirichlet), binomial (and beta) and profile HMMs
Poisson (and gamma) and evolutionary rates
Expectation Maximization derived and dissected
Comparative genomics using pairwise Hidden Markov models ("Pair HMMs")
Historical background: homology search, Smith-Waterman, Needleman-Wunsch, Gotoh
Viterbi, Forward-Backward, etc for more than one sequence
Protein alignment (PSW, PROBCONS)
DNA genefinding (TWINSCAN, SLAM)
Modeling evolution and other time-dependent processes using continuous-time Markov chains
Solution by eigenspace decomposition
Evolution of nucleotides, codons, amino acids and basepairs
Phylogenetic trees and Felsenstein's peeling algorithm
Biophysical models of ion channel gating. Patch-clamp experiments; ARMA
Modeling higher-order correlations in RNA (and proteins) using stochastic context-free grammars and graph grammars
The CYK and Inside-Outside algorithms
Design of SCFGs for predicting RNA secondary structure
Design of SCFGs for identifying bacterial operons, foldback transposons and other inverted repeat signals in DNA
Pair SCFGs for comparative RNA genefinding
"Statistical Alignment": combining stochastic grammars and evolutionary models for massively parallel comparative genomics
Motivation: the future of high-throughput multi-species comparisons is upon us
Combining continuous-time Markov chains with stochastic grammars
The pruning algorithm and EM for substitution processes
Statistical alignment and difficult problems in modeling real-life sequence evolution
Further discussion of applications including genefinding, systems biology, alignment, profiling
Probabilistic models in systems biology
Discovering pathways/modules by analyzing gene complement
Gaussian processes for post-genomic data (e.g. microarrays); the covariance function
Integrating clustering of expression data with transcription factor binding-site discovery
Advanced topics. May include:
Markov Chain Monte Carlo (MCMC) approaches. Hastings ratios. Importance sampling
Markov random fields. Gibbs form. MCMC inference
Neural networks. Hebbian view; Bayesian view
Probabilistic modeling and SyntheticBiology
Algorithms
The following algorithms for bioinformatics and systems biology
are all implemented in DART and may be covered:
Note: Included topic Teaching.DartAlgorithms does not exist yet
Analysis of algorithms
NoisyChannelCodingTheorem , AlignmentAccuracy .
Extensions
Algorithms tackling the following subjects will be considered
advanced/supplementary material to this class, and may be covered
if there is time during the semester:
NeuralNetworks , GraphicalModels , MarkovRandomFields , ContextDependentSubstitution , CoalescentMethods ; GaussianProcesses , BarashFriedmanClustering ; PseudoknotPrediction , IsambertFoldingKinetics , MolecularForcefields , MembraneChannelDynamics .
RTreeSearch (for GFF).
Other stuff
GraduateClassFlier
-- IanHolmes - 22 Aug 2005
-----
Revision r4 - 2007-06-18 - 21:25:02 - IanHolmes
Biowiki content is in the public domain.
Comments on this site? OldGradClassSyllabus ">Send feedback