click on the Biowiki logo to go to homepage
Edit Raw Print
Links Diffs RSS
About Stats Recent


Research Teaching Blog
Fall09 | Sandbox
Biowiki > Teaching > PrimatePhylogenyF05

Search

Advanced search...

Topics

PageRank Checker

Goals

  • Investigate whether humans are, indeed, related to apes
  • Brief exposure to phylogenetic tree-building software

Procedure

  • As usual, the homework exercises are in boldface. To summarise the main items, you are required to
    1. Describe one conserved and one variable column in the ATP6 alignment
    2. Discuss the validity of assuming "no more than one substitution per site"
    3. Sketch a function of likelihood versus evolutionary separation time, for a hypothetical pairwise alignment
    4. Find out what the entries of a phylogenetic distance matrix represent
    5. Plot a tree of vertebrate species, based on the ATP6 alignment
  • The first part of this practical involves making trees from a multiple alignment of amino acid sequences for a subunit of the ATP synthase enzyme.
  • Examine the ATP6 alignment.
    • use a program like Jalview or belvu to visualize the alignment, or just examine it using a Unix program to look at text files (like more)
    • here is the alignment file: ATP6.stockholm
    more ATP6.stockholm
    
  • Identify (and describe) one highly conserved column, and one variable column.
  • Extract human and chimp sequences, either manually or with this perl one-liner
    cat ATP6.stockholm | perl -e 'while(<>){if(/(homo|chimp)(S+)s+(.*)/){$seq{$1.$2}.=$3}}while(($name,$seq)=each%seq){print"$name   $seq
    "}' >human-chimp.stockholm
    
  • What proportion of the amino acid sites are
    • identical?
    • different?
  • Assume each different amino acid site indicates one or more mutations, while each identical amino acid site indicates no mutations.
    • Why might this assumption not be valid? How could this bias your estimates of evolutionary divergence?
    • Suppose that the number of mutations at a site is Poisson-distributed, with (on average) 1 mutation per site per "unit" of time. Write down expressions for
      • the probability that a site experiences no mutations after time t
      • the probability that a site experiences one or more mutations after time t
    • Suppose that there are N sites. What is the probability that, after time t, K of these sites have not experienced a mutation (as a function of t)?
      • Sketch this function (likelihood vs time) for the case N=100 and K=50.
      • If a site "has not experienced a mutation", is this the same as saying that "the human and chimp amino acids are identical"? If not, why not -- and how would this affect your analysis.
  • Estimate a "distance matrix" for these species:
    ~be131/dart/bin/tkfdistance --nocountindel -log 6 ATP6.stockholm >ATP6.distance
    
  • What do the entries of the distance matrix represent?
    • (See if you can answer this by e.g. searching for "phylogenetic distance matrix" on Google; if not, ask your neighbor, the professor or the GSI.)
  • Estimate a tree by "weighted neighbor-joining"
    ~be131/dart/bin/weighbor -i ATP6.distance -o ATP6.tree -vvv
    
  • Draw the tree, and print it out
  • Try building some trees from other protein (or DNA) alignments. For example...
  • The kind of phylogenetic analysis that you have been doing here assumes a random model of evolution (see e.g. the Poisson analysis). Sometimes one also assumes a neutral model (essentially ruling out natural selection). This has always been a source of controversy among evolutionists, most recently providing ammunition to the advocates of "intelligent design". What do you think about this? In what sense might evolution be thought to be random or deterministic? Leaving aside the somewhat politicised issue of intelligent design, is there any way you could prove evolution to be nonrandom?

Software

Notes

  • Identify (and describe) one highly conserved column, and one variable column.
    Using a program such as belvu, you can easily identify conserved and variable columns. For example, in the ATP6 alignment, there is one stretch of pretty well conserved residues between positions 89 and 110 (highlighted light-blue in belvu). There is a variable column located at position 23 (no highlight in belvu).
  • Assume each different amino acid site indicates one or more mutations, while each identical amino acid site indicates no mutations. Why might this assumption not be valid? How could this bias your estimates of evolutionary divergence?
    Identical amino acids at a site doesn't guarantee that the site experienced no mutation. For example, you can imagine that a lysine can first mutate to a different amino acid and then later on, gets mutated back to lysine. In addition, if you take into account the degeneracy of the genetic code (multiple codons code for the same amino acid), the same amino acid does not guarantee the same underlying DNA sequence.
  • Sketch a function of likelihood versus evolutionary separation time, for a hypothetical pairwise alignment
    First, we need to figure out the probability that a site experiences no mutation. We assume that the number of mutations at a site is Poisson-distributed with rate(mu)=1, so

      P(no mutations)=exp(-t)  and P(1+ mutations)=1-exp(-t)
      

    Now we consider N sites. We use the binomial distribution to model this situation using the probabilities above for each of the individual sites.

      P(k sites out of N experience no mutations after time t) = (N choose k) * P(no mutations)^k * P(1+ mutations)^(N-k)
                                                               = N!/(k! (N-k)!) * exp(-kt) * (1-exp(-t))^k
      
    To sketch this, I used Matlab and varied the time t from 0 to 5. Since the factorials in (N choose k) are pretty big, I used Stirling's approximation to get their values. The resulting curve has a peak at t~0.7.

    Sketch of probability as a function of time

  • What do the entries of the distance matrix represent?
    Each entry in the matrix is a measurement of how closely related two organisms are, based on their sequences for ATP6.
  • Plot a tree of vertebrate species, based on the ATP6 alignment
    see attached PDF file
Actions: Edit | Attach | New | Ref-By | Printable view | Raw view | Normal view | See diffs | Help | More...