Rna Structure Prediction Resources

From Biowiki
Jump to: navigation, search

RNA structure prediction resources

How in the world can this be organized? Hmm, what questions can be asked about a structure prediction method? Maybe I can use them to make a decision tree/flowchart, or maybe each question could be a table column.

  • What structure are you predicting?
    • Secondary or 3-D
    • Pseudoknots?
    • Base pair triples & other tertiary interactions? (no method currently does this stuff, but maybe someday...)
    • How are you incorporating structural context into your model, if at all? For example, are you considering any of the following:
      • hairpin and interior loop motifs
      • base pair stacking
      • coaxial stacking of helices
  • What is your scoring function? Are the parameters rooted in thermodynamics or estimated probabilistically?
    • For thermodynamics-based scoring functions:
      • Which version of the Turner energy parameters are you using?
      • Are you using the experimentally-derived, “pure” parameters directly, or did you tweak them by hand to improve structure accuracy?
    • For probabilistic scoring functions:
      • How are you modeling your structure? (stochastic context-free grammars, conditional log-linear models, etc.)
      • How are you estimating your parameters?
  • Is the method doing local or global structure prediction, or both?
  • What structure(s) is (are) computed?
    • A single best/most likely structure.
    • Set of suboptimal structures.
    • Base pair probability matrices.
    • Structure ensembles/clusters (e.g. centroids from Sfold, delta-neighbors from RNAbor)
  • Is the input a single sequence or multiple sequences?
  • For multiple sequences:
    • Are you taking an alignment as input, or are you doing or a structural alignment yourself? If the latter, how are you scoring this alignment?
    • Are you incorporating sequence or structure-dependent phylogeny/evolution into your model (i.e. your score function)? How are you doing it?

We will need to have a special section on papers that do not describe a novel method/implementation, e.g.:

  • literature leading to development of methods, but not really a complete implementation in itself (e.g. work by Nussinov, Sankoff, other classics)
  • reviews of methods
  • benchmarks of methods
  • RNA structural biology and biophysics papers that provide a background and justification for structural elements that we try to model

---

See also:

Programs that predict RNA structure (and sometimes structural alignment)

More potential columns:

  • constraints on input
  • big-O complexity
  • wall runtime
  • for mult seqs: max num of sequences (practical)
  • max sequence length (practical)

Explanation of columns:

  • method: Name of the program.
  • input: Input to the program:
    • single seq: single sequence
    • mult aligned seqs: a multiple sequence alignment
    • mult unaligned seqs: multiple, unaligned sequences (the algorithm may or may not make its own an alignment)
  • output: Output of the program:
    • 2ndary: secondary structure (set of base pairs)
    • 3D: 3-D (all-atom) structure
    • best struct: a single best structure
    • best struct align: a single best structural alignment (TODO define more exactly what is meant by this)
    • Set of suboptimal structures.
    • Some sort of set of alignments (does anyone do this?)
    • Base pair probability matrices.
    • Structure ensembles/clusters (e.g. centroids from Sfold, delta-neighbors from RNAbor)
  • scoring: The scoring function or approach.
    • energy: based on thermodynamics, specifically the free energy of folding (probably Turner energy, since there are no other kinds)
    • prob: probabilistic
      • pair-SCFG
    • information-theoretic?

Click on the column title to sort by that column.

method input output scoring refs
stemloc mult unaligned seqs best 2ndary struct, struct align prob (pair-SCFG) 1

References

stemloc

TODO: short description

Holmes &: Accelerated probabilistic inference of RNA structure evolution. BMC Bioinformatics 2005;6:73.

Disorganized literature list (TODO: put in table above)

TODO: Everything needs to either go in the table above, or get organized into a reading list.

Reviews

Secondary structure prediction

From single sequence

TODO: put in reverse chronological order, get links, citations!

  • RNAfold (from Vienna package)
  • RNAstructure
  • mfold
  • Sfold
  • Dowell and Eddy SCFG design paper

From multiple sequences

Fixed alignment as input

Probabilistic
Energy-based
Other

No alignment necessary

Special pseudoknot models

This is a wacky category and gets its own special section. Anything that does pseudoknots should go here.

Alignment of sequence to structure, structure to structure (?), etc.

Any of these tools should, in theory, be usable as a screening tool - just make the database or query out of your genome and search using known families! So, these are also linked to from Rna Screen Resources.

Database search

Covariance models

Clustering

Other structure prediction (???)

Molecular dynamics, kinetics of folding, etc.

Do I really want to get into this?

TODO: SORT ME (I have not read these)

  • Contrafold

Tools for working with alignments/structures

Assessing accuracy of predictions/alignments

Sources of "true" structures

See also: Rna Screen Resources#Sequence_databases

TODO: fill me! (copy relevant stuff from above page)

Accuracy benchmarks

---

-- Created by: Andrew Uzilov on 12 May 2007