Home - this site is powered by TWiki(R)
Teaching > BioE131 > RNAFoldingLab
TWiki webs: Main | TWiki | Sandbox   Log In or Register

RNA folding lab

In this lab you will learn to use the VIENNA package and various related tools for analyzing, predicting the secondary structure of & visualizing RNA sequences.

Predict the structure of the nanos translational control element

To begin with you will predict the secondary structure of the translational control element (TCE) found in the 3'UTR of the nanos gene in the fruitfly, Drosophila melanogaster. This element is described in the following paper (among others):

Extract the sequence

The element can be found at nucleotides 2491 to 2554 of the following Genbank entry for the nanos gene itself:

First, extract these 64 nucleotides. You'll now predict the secondary structure and compare to the structures shown in the above paper.

Predict the minimum free energy (MFE) structure

You will use the RNAfold program for this. It's installed on the DECF machines. First read the web version of the manual page for this tool: RNAfold manpage

To run RNAfold, simply type it at the command line:

$ RNAfold

and then paste in your sequence.

Now, following the instructions on the manpage, use the RNAfold tool to predict the structure of the nanos TCE sequence that you extracted.

Visualize the structure

There are several ways to visualize the structure.

Convert to Stockholm format, then colorize at the command-line

First save the sequence and the structure to a file. There should be two lines, one containing the sequence and one containing the RNAfold-predicted structure, e.g.

AAAAAAAAAAAAAAAAUUUUUUUUUUUUUUUU
((((((((((((((....))))))))))))))

(NB this is not the actual sequence of the nanos TCE!)

Having saved this to a file (e.g. with filename TCE_struct), try visualizing it with ColorStock. You will first have to convert from RNAfold format into Stockholm format using the script rnafold2stockholm.pl.

colorstock.pl and rnafold2stockholm.pl are part of the DART package, which is also installed on the DECF machines. They can be found in the directory /usr/local/dart/perl.

NOTE: These are Perl scripts rather than Python scripts, but since you do not need to modify them in any way - you only need to execute them - you should go ahead and use them as Perl scripts. You can run a Perl script just like a Python script - simply type its name at the command line followed by arguments, or if it's not configured correctly, precede it with perl just as you'd precede a Python script with python:

$ myperlfile.pl arg1

or

$ perl myperlfile.pl arg1

Plot the secondary structure using RNAplot

Next, try using the RNAplot tool (part of VIENNA) to render a 2D plot of the secondary structure in Wikipedia:Postscript.

As before, you can find info on how to use the RNAplot tool by typing man RNAplot

Note that RNAplot creates an output file rna.ps. In order to view this output, you may need to convert from Postscript to Wikipedia:Portable_Document_Format (PDF); you can do this by typing convert rna.ps rna.pdf on a Linux machine that has Imagemagick installed (http://www.imagemagick.org/). You can then open the PDF file using Adobe Reader (which you can download from Adobe; it should already be on the DECF machines) or a number of other free PDF readers (e.g. Wikipedia:Apple_Preview).

(You may have noted that RNAfold automatically produces a plot of the MFE structure it predicts, generating an rna.ps file; but now you know how to generate a plot of any structure, not just one created with RNAfold...)

Compare to the published structure

Compare your predicted MFE structure to the structure published in the paper by Gavis et al, linked above. Do the structures match? Does your structure expose the protein-binding sequence CUGGC in the first loop?

Try varying the temperature

Look again at the RNAfold manual page. What is the default temperature for secondary structure prediction? What temperature would be most appropriate for folding this sequence? How do you vary the temperature? (Why should RNA folding be dependent on temperature anyway?)

Try several different values for the temperature, e.g. 23 degrees C; 80 degrees C. How does the structure change?

Constrain some bases to be unpaired

Try running RNAfold using the -C option (check the manpage to see how this option is used). Constrain a few bases to be unpaired (for example, in the main stem) and observe whether the structure changes.

Automate the structure prediction process

Consider the following Python program:

#!/usr/bin/python

import sys
import subprocess

rnafoldout = open("tempfile.txt", "w")     # Opens an output file
rnafold = subprocess.Popen(['RNAfold], stdout=rnafoldout, stdin=subprocess.PIPE)  # Starts RNAfold
rnafold.communicate(input=sysargv[1])[0]  #Inputs the first argument (input sequence) to the RNAfold prompt
rnafoldout.close()

subprocess is a module that allows you to run other external programs as a "subprocess" of the Python program. The options for stdin and stdout allow you to define, respectively, where it should look for input when this external program asks for input and where it should write output when the program has an output (e.g., instead of just printing it to the screen). stdout has to be set to something writable. For stdin, we instruct it that it should take "piped in" input, which we "pipe in" using the .communicate() method.

If you are unsure, try invoking this Python script with the sequence of the nanos TCE (or any other RNA sequence) as the first (and only) argument, then examining the file Tempfile.txt.

Consider how you might extend this Python script to

  1. read the predicted structure into a variable;
  2. explore the effect of introducing an oligomer that is complementary to a given substring of the supplied sequence.

Compute the partition function and calculate posterior probabilities of basepairing

Use the -p option to RNAfold to save the basepairing posterior probability matrix to a file dot.ps. This uses the McCaskill algorithm to calculate the probability that any given basepair will be present in the equilibrium ensemble of structures.

Open this file (you may need to convert to PDF...) and view it. What do the diagonal black strips of pixels signify? Can you see that one of these strips (the lowest one above the main diagonal from top-left to bottom-right of the image) has a faint "ghost" strip, above and to the left? How do you interpret this "ghost" strip?

How would you extract the posterior probabilities of a given basepair from this file? (Hint: read the RNAfold documentation carefully, in particular the documentation of the Postscript format, looking for the string "ubox".)

Repeat using a homologous sequence from RFAM

There are several other known examples of the nanos TCE. You can find them in this RFAM multiple alignment:

Extract one of these sequences and predict its structure using RNAfold. How does it compare to the other sequence you folded?

Explore other tools

The VIENNA webpage includes several other tools apart from the ones you have used here, including software for RNA folding kinetics, RNA design and several other applications. Try at least one or two of these other tools. Your goal should be to aim to get a feel for the range of bioinformatics tools that are available for analysis of RNA sequence.

There are also RNA structure tools available on the web, e.g. RNAmovies:

Of course there are web interfaces to the Vienna tools as well. Here we have emphasized fluency in the use of these tools on the command line. Why do you think this might be useful? (Hint: consider the Perl program above)


Homework

This homework comes in three parts.

Hammerhead ribozyme YES gate

Verify the properties of the YES-1 gate in Figure 2 of the RNA logic gates paper by Breaker and Penchovsky, covered in class:

NB: the relevant RNA sequence is given in Figure 2.

You should report the following for both the OFF and ON positions of the switch (NB these are exactly the results reported for Fig2a of the Breaker-Penchovsky paper):

  1. Predicted MFE structure;
  2. Base-pairing probability plot;
  3. Status of stems I, II and III from the hammerhead structure, and accessibility of the cleavage site.

Hint: in the ON configuration, part of the YES-1 sequence (the Oligonucleotide Binding Site, or OBS) is base-paired with an external, complementary sequence (the DNA-1 sequence). If you removed DNA-1 without changing the structure of YES-1, then the OBS part of YES-1 would be single-stranded. (Contrast this with the OFF configuration, where OBS folds back on itself and an adjacent part of YES-1, forming stem IV.)

In other words, the change in structure when DNA-1 is introduced as a binding partner to OBS may be considered as a two-step process: first OBS becomes single-stranded, then it binds to DNA-1. You only really need to consider the first step (i.e. OBS becoming single-stranded). How might you force this outcome when predicting the structure of the ON state?

Write up your entire analysis on your biowiki page (or on a separate homework page, linked to from your biowiki page). Include details of every program that you run and the command-line options that you used for that program, to sufficient detail that someone could reproduce your analysis EXACTLY.

Software for verifying YES gate

Outline the design for a Python program that verifies the correctness of a YES gate of the form given in the previous part of the homework.

You don't need to actually write the program, but you should describe all the key steps, giving code snippets where appropriate.

The program should take as inputs:

  1. an RNA sequence (corresponding to the YES-1 sequence from the Penchovsky-Breaker example);
  2. the co-ordinates of the oligonucleotide binding site (the OBS subsequence from the P-B example).

The output of the program should be the truth table of the logic gate.

Hammerhead ribozyme structure

Predict the MFE structure of the AF404053.1 hammerhead ribozyme sequence and compare to the structure given in Rfam.

Data:


-- IanHolmes - 15 Sep 2009

-- IanHolmes - 18 Aug 2008

Edit | Attach | Print version | History: r44 < r43 < r42 < r41 < r40 | Backlinks | Raw View | Raw edit | More topic actions


Parents: BioE131
This site is powered by the TWiki collaboration platformCopyright © 2008-2013 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
TWiki Appliance - Powered by TurnKey Linux