Locating Classes of Functional Residues in a Protein via Estimation of Substitution Rates
Here is the final report (183k PDF):
The Evolutionary Trace (ET) Method has been applied to protein families to predict functional determinants of proteins based upon their sequence similarity to other proteins in its family. ET identifies functional residues that are conserved within subfamilies and are responsible for the functional specificity of each subfamily. However, the ET method is inherently ad-hoc and a more rigorous method that achieves similar results can be developed. This method assumes that more important residues within a protein change more slowly than less important ones and allows for different residues to use different substitution rate matrices. The results of this method are compared to ET for the SH2 protein family.
We would like to develop an improved method (relative to ET) to identify important residues in a protein. The method relies upon EM implemented in xrate
to estimate the rate matrix for a protein family that allows for multiple classes of substitution rates. These classes of substitution rates will be interpreted relative to a protein family, the Src Homology 2 Domain (SH2). Eigenvalues and their corresponding eigenvectors will be analyzed to determine how quickly information decays and exactly what information decays within each of these functional classes.
Proof of Concept
We have outlined a method to identify functional classes of residues that addresses the same questions as the Evolutionary Trace Method. The method is roughly composed of the following steps, with added details available in the article
1. Collect homologous sequences (e.g. via a Pfam Hidden Markov Model Search against Public Sequence Databases)
2. Reconstruct a phylogeny for these sequences, along with a suitable multiple alignment.
3. Estimate the substitution rate matrices for a chosen number of rate classes, using xrate
4. Rank the rate classes according to the trace of the substitution matrices; this ranking provides an annotation of "relative importance" among the rate classes.
5. Compute the eigensolutions for the rate matrices, and describe the information decay for each rate class using the eigensolutions as a guide.
The SH2 domain family was used to test this method, and detailed results are available in the article. We simultaneously trained six hidden substitution rate matrices for the SH2 domain family, using xrate. These six classes had distinct characteristics, and could be ranked according to their relative importance. One class described residues that were relatively conserved over evolutionary time; its most negative eigenvalue corresponded to an eigenvector that primarily differentiated between the amino acids Histidine and Proline, two large amino acids with nitrogenous rings. This was not surprising as information differentiating between these two amino acids may be lost with time in the SH2 domain when either amino acid sufficiently satisfies the biochemical requirements for that residue. Rate classes that described non-conservative, or mutable, evolution generally had negative eigenvalues of larger magnitude. Specific information decay was more difficult to describe for these mutable rate classes, as the decay was more severe and widespread.
This article describes a streamlined method that more rigorously and probabilistically models the evolution of protein sequences with time, relative to ET. The method builds upon the same fundamental assumption of ET, namely that functional residues tend to mutate slower than nonfunctional residues. It allows, in effect, non-linear partitions of phylogenies when assigning functional annotation, a feature that ET could not provide. It also dispels choosing arbitrary tree partitions, and allows for phylogenies that are not ultrametric. The assignment of rate classes does not suffer from rapid loss of signal as ET does; indeed, increased variety may help to better differentiate (and assign) rate classes to residues in a family. Finally, this method produces output that can readily be used for clustering analyses when 3D structures are available.
Holmes I, Rubin GM. An expectation maximization algorithm for training hidden substitution models.
J Mol Biol. 2002 Apr 12;317(5):753-64.
Lichtarge O, Bourne HR, Cohen FE. An evolutionary trace method defines binding surfaces common to protein families.
J Mol Biol. 1996 Mar 29;257(2):342-58.
Heckman JE, Lambert D, Burke JM. Photocrosslinking detects a compact, active structure of the hammerhead ribozyme.
Biochemistry. 2005 Mar 22;44(11):4148-56.
- 7 Dec 2005
Copyright © 2008-2013 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback