Basepair Rates

From Biowiki
Jump to: navigation, search

PFold vs. RFAM-trained parameters

Below are bubble plots of the dinucleotide rate matrix estimated from pairwise alignments and used by PFOLD, along with the analogous rate matrix estimated from RFAM multiple alignments.

Note that the RFAM-trained parameters have significantly larger probabilities for non-canonical basepairs, which appears to have a detrimental impact on the performance of the RFAM-trained rates at predicting ncRNAs in multi-genome-alignment screens.

I don't think the difference is an XRATE error; I have reason to think this is a real difference between the RFAM structural alignments and the training data that Bjarne Knudsen used (derived from a merge of the Bayreuth tRNA database of Sprinzl et al, and the LSU rRNA database of De Rijk et al).

Specifically, these:

The difference in the rates could be down to a couple of things:

  1. according to Knudsen & Hein: RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 1999;15:446-54. (p449), the Bayreuth tRNA database

didn't have any noncanonical basepairs in, so they had to add them in by assuming that single-base symmetric doublestrand bulges were actually basepairs, i.e. <.<....>.> would be converted to <<<....>>>

  1. RFAM imposes a single consensus structure on all members of a

family, so the potential for misannotated basepairs in a very large family is rather big.

Pfold probabilities & rates

  • Pfold parameters.

Note the row of equilibrium probabilities beneath the rate matrix: nice big green bubbles for WC basepairs, tiny black bubbles for non-WC basepairs (except the wobble basepairs which are larger).
RnaModels.ncRnaDualStrand v12.png

XRATE-estimated probabilities & rates (from RFAM)

  • RFAM-trained parameters (from alignments annotated as having "published" secondary structures, as opposed to "predicted").

Note the larger black bubbles in the equilibrium probability row. Messy data?
RnaModels.ncRnaDualStrand v13.png

XRATE-estimated probabilities & rates (from CONSAN training set)

  • CONSAN mix80-trained parameters

Scale is different... RnaModels.mix80.png

-- Ian Holmes - 31 Oct 2007