RNA folding lab
In this lab you will learn to use the VIENNA
package and various related tools for analyzing, predicting the secondary structure of & visualizing RNA sequences.
Predict the structure of the nanos translational control element
To begin with you will predict the secondary structure of the translational control element (TCE) found in the 3'UTR of the nanos
gene in the fruitfly, Drosophila melanogaster
This element is described in the following paper (among others):
Extract the sequence
The element can be found at nucleotides 2491 to 2554 of the following Genbank entry for the nanos
First, extract these 64 nucleotides. You'll now predict the secondary structure and compare to the structures shown in the above paper.
Predict the minimum free energy (MFE) structure
You will use the
program for this. It's installed on the DECF machines. First read the web version of the manual page for this tool: RNAfold manpage
, simply type it at the command line:
and then paste in your sequence.
Now, following the instructions on the manpage, use the RNAfold tool to predict the structure of the nanos
TCE sequence that you extracted.
Visualize the structure
There are several ways to visualize the structure.
Convert to Stockholm format, then colorize at the command-line
First save the sequence and the structure to a file.
There should be two lines, one containing the sequence and one containing the RNAfold-predicted structure, e.g.
(NB this is not
the actual sequence of the nanos
Having saved this to a file (e.g. with filename
), try visualizing it with ColorStock
You will first have to convert from RNAfold format into Stockholm format using the script rnafold2stockholm.pl
colorstock.pl and rnafold2stockholm.pl are part of the DART
package, which is also installed on the DECF machines. They can be found in the directory
These are Perl scripts rather than Python scripts, but since you do not need to modify them in any way - you only need to execute them - you should go ahead and use them as Perl scripts. You can run a Perl script just like a Python script - simply type its name at the command line followed by arguments, or if it's not configured correctly, precede it with
just as you'd precede a Python script with
$ myperlfile.pl arg1
$ perl myperlfile.pl arg1
Plot the secondary structure using RNAplot
Next, try using the RNAplot tool (part of VIENNA) to render a 2D plot of the secondary structure in Wikipedia:Postscript
As before, you can find info on how to use the RNAplot tool by typing
Note that RNAplot creates an output file
In order to view this output, you may need to convert from Postscript to Wikipedia:Portable_Document_Format
(PDF); you can do this by typing
convert rna.ps rna.pdf
on a Linux machine that has Imagemagick installed (http://www.imagemagick.org/
You can then open the PDF file using Adobe Reader (which you can download from Adobe
; it should already be on the DECF machines)
or a number of other free PDF readers (e.g. Wikipedia:Apple_Preview
(You may have noted that RNAfold automatically produces a plot of the MFE structure it predicts, generating an
but now you know how to generate a plot of any structure, not just one created with RNAfold...)
Compare to the published structure
Compare your predicted MFE structure to the structure published in the paper by Gavis et al
, linked above.
Do the structures match?
Does your structure expose the protein-binding sequence CUGGC in the first loop?
Try varying the temperature
Look again at the RNAfold manual page.
What is the default temperature for secondary structure prediction?
What temperature would be most appropriate for folding this sequence?
How do you vary the temperature?
(Why should RNA folding be dependent on temperature anyway?)
Try several different values for the temperature, e.g. 23 degrees C; 80 degrees C.
How does the structure change?
Constrain some bases to be unpaired
Try running RNAfold using the
option (check the manpage to see how this option is used).
Constrain a few bases to be unpaired (for example, in the main stem) and observe whether the structure changes.
Automate the structure prediction process
Consider the following Python program:
rnafoldout = open("tempfile.txt", "w") # Opens an output file
rnafold = subprocess.Popen(['RNAfold], stdout=rnafoldout, stdin=subprocess.PIPE) # Starts RNAfold
rnafold.communicate(input=sysargv) #Inputs the first argument (input sequence) to the RNAfold prompt
is a module that allows you to run other external programs as a "subprocess" of the Python program. The options for
allow you to define, respectively, where it should look for input when this external program asks for input and where it should write output when the program has an output (e.g., instead of just printing it to the screen).
has to be set to something writable. For
, we instruct it that it should take "piped in" input, which we "pipe in" using the
If you are unsure, try invoking this Python script with the sequence of the nanos
TCE (or any other RNA sequence) as the first (and only) argument, then examining the file
Consider how you might extend this Python script to
- read the predicted structure into a variable;
- explore the effect of introducing an oligomer that is complementary to a given substring of the supplied sequence.
Compute the partition function and calculate posterior probabilities of basepairing
option to RNAfold to save the basepairing posterior probability matrix to a file
This uses the McCaskill algorithm to calculate the probability that any given basepair will be present in the equilibrium ensemble of structures.
Open this file (you may need to convert to PDF...) and view it. What do the diagonal black strips of pixels signify?
Can you see that one of these strips (the lowest one above
the main diagonal from top-left to bottom-right of the image)
has a faint "ghost" strip, above and to the left?
How do you interpret this "ghost" strip?
How would you extract the posterior probabilities of a given basepair from this file?
(Hint: read the RNAfold documentation carefully, in particular the documentation of the Postscript format, looking for the string "ubox".)
Repeat using a homologous sequence from RFAM
There are several other known examples of the nanos
TCE. You can find them in this RFAM multiple alignment:
Extract one of these sequences and predict its structure using RNAfold.
How does it compare to the other sequence you folded?
Explore other tools
webpage includes several other tools apart from the ones you have used here, including software for RNA folding kinetics, RNA design and several other applications.
Try at least one or two of these other tools.
Your goal should be to aim to get a feel for the range of bioinformatics tools that are available for analysis of RNA sequence.
There are also RNA structure tools available on the web, e.g. RNAmovies:
Of course there are web interfaces to the Vienna tools as well.
Here we have emphasized fluency in the use of these tools on the command line.
Why do you think this might be useful?
(Hint: consider the Perl program above)
This homework comes in three parts.
Hammerhead ribozyme YES gate
Verify the properties of the YES-1 gate in Figure 2 of the RNA logic gates paper by Breaker and Penchovsky, covered in class:
NB: the relevant RNA sequence is given in Figure 2.
You should report the following for both the OFF and ON positions of the switch (NB these are exactly the results reported for Fig2a of the Breaker-Penchovsky paper):
- Predicted MFE structure;
- Base-pairing probability plot;
- Status of stems I, II and III from the hammerhead structure, and accessibility of the cleavage site.
Hint: in the ON configuration, part of the YES-1 sequence (the Oligonucleotide Binding Site, or OBS) is base-paired with an external, complementary sequence (the DNA-1 sequence).
If you removed DNA-1 without changing the structure of YES-1, then the OBS part of YES-1 would be single-stranded.
(Contrast this with the OFF configuration, where OBS folds back on itself and an adjacent part of YES-1,
forming stem IV.)
In other words, the change in structure when DNA-1 is introduced as a binding partner to OBS may be considered as a two-step process:
first OBS becomes single-stranded, then it binds to DNA-1.
You only really need to consider the first step (i.e. OBS becoming single-stranded).
How might you force this outcome when predicting the structure of the ON state?
Write up your entire analysis on your biowiki page (or on a separate homework page, linked to from your biowiki page).
Include details of every program that you run and the command-line options that you used for that program,
to sufficient detail that someone could reproduce your analysis EXACTLY
Software for verifying YES gate
Outline the design for a Python program that verifies the correctness of a YES gate of the form given in the previous part of the homework.
You don't need to actually write the program, but you should describe all the key steps, giving code snippets where appropriate.
The program should take as inputs:
- an RNA sequence (corresponding to the YES-1 sequence from the Penchovsky-Breaker example);
- the co-ordinates of the oligonucleotide binding site (the OBS subsequence from the P-B example).
The output of the program should be the truth table of the logic gate.
Hammerhead ribozyme structure
Predict the MFE structure of the AF404053.1 hammerhead ribozyme sequence and compare to the structure given in Rfam.
- 15 Sep 2009
- 18 Aug 2008
Copyright © 2008-2013 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback