click on the Biowiki logo to go to homepage



Research Teaching Blog
Fall08 | Sandbox
Biowiki > Fall08 > TWiki Users > DavidT

Search

Advanced search...

Topics


Links

PageRank Checker

About Me

David Tulga is currently a fourth year Bioengineering major at UC Berkeley. He is also a Regents' and Chancellor's Scholar and an undergraduate researcher in the Arkin Laboratory for quantitative biology. During the summer of 2007 he participated on the 2007 Berkeley iGEM Team. During the summer of 2006 through the spring of 2007, he was an undergraduate researcher in Dr. Steven E. Brenner's Computational Genomics Research Group. Additionally, during his freshman year, he researched the psychophysics of vision in Dr. Cohn's lab. David plans to pursue a PhD in bioengineering, with a specialization in Systems and Synthetic Biology, in order to play a leading role in the research and development of multidisciplinary technologies. In his free time, David also develops free educational software for children, enjoys cross-country running, and practices martial arts. David Tulga's website, DavidTulga.com, contains his other projects and activities.

Final Project Report

Mini-Report for Final Project

      I am working on Topic #2: Software for RNA logic gate design, and am working independently on this final project. I am planning on utilizing a similar approach to the methodology described in the Penchovsky paper on the computational design of ribozyme logic gates. These programs would generate candidate sequences for a given RNA logic gate, with the ribozyme logic gate taking an oligonucleotide as an input.

      I am planning to first create an application that can generate YES gate candidates, and possibly generate candidates for other gates, time permitting. It would follow in general the methodology presented in the Penchovsky paper, first generating random OBS elements, and combining those into prototype Hammerhead ribozymes. It would then calculate the OFF and ON states and their partition functions and MFE structures, primarily utilizing the RNAfold program from the Vienna package. It would then verify that there would be autocatalytic activity by analyzing the structure and the three stems that should be generated. The three stems should be formed in the ON state, with the DNA effector bound, and stem IV should be formed when not in the presence of the DNA effector. It would then perform further verification of the structures under different circumstances. It would first check that the OFF structure is not too stable or too weak, so that between 30% and 70% of the OBS nucleotides base pair in the OFF state. It would then compare the free energy between the two structures and ensure than the gap is within -6 and -10 kcal/mol. It could then verify temperature stability by using Vienna’s RNAheat program to determine stability throughout the range of 20 to 40°C. It would then examine the ensemble diversity and ensure that it is not too large, that is less or equal to 9 units.

      Additionally, I could implement the stage 2 procedure similarly to generate many distinct ribozyme gates with similar properties but different OBS sequence specificity. This could utilize the RNAinverse program to generate new OBS sequences to verify. I could then employ a similar procedure to the first stage, to verify that the new candidate contained similar thermodynamic properties to the original candidate sequence. This could also utilize the RNAfold programs to calculate structures and thermodynamic stability, as well as utilize RNAheat for temperature analysis, and possibly kinRNA or kinfold to determine optimal kinetics of the gate.

      I plan to implement these programs in the next few weeks to allow for enough time to perform verification and prepare my final report. As well, I plan on primarily using my personal computer or the DECF machines remotely to run these analysis applications.

Sequence Alignment Homework

  1. In predicting the three-dimensional structure of a protein sequence, a good sequence alignment can help by finding homologous sequences in databases, which may already have solved structures. It can also be useful to help align known motifs or domains in the unknown protein to databases of known motifs to allow for secondary structure prediction.
  2. Sequence alignment is also useful when introducing random mutations into a protein sequence for the purposes of directed (in vitro) evolution experiments as it allows researchers to find the exact positions where the mutation occurred to identify the best final sequence. It can also be used in multiple sequence alignments to look for conserved or nonconserved sequences or amino acids, which one might want to modify, depending upon what type of selection was desired. For example, modifying conserved sequences might be useful in evolving the active site, while modifying nonconserved residues could be useful in making a new protein-protein interaction with minimal disruption of structure and function.
  3. When determining whether a particular bacterial gene has been introduced to its host genome via a "horizontal transfer" event sequence alignment can be very useful in being able to pinpoint any fragment of genetic material transduced when aligned to the original sequence, even in large genomes. It can also be used to identify homologous regions where recombination could take place due to sequence similarity.
  4. Sequence alignment can also help identify regulatory elements (such as riboswitches) in a genome sequence by looking for conserved sequences or aligning them to known regulatory elements. It can also help find inverted repeats or hairpin structures through alignment to itself.

RNA Folding Homework

Homework 1

  • Computational biology is relevant to Synthetic Biology because it provides the computational framework to understand the diversity and complexity in life and to analyze how parts may behave under new circumstances. It can provide comprehensive databases and search tools to find novel parts present in other organisms for use in new genetic circuits and Synthetic Biology applications. It can also provide analysis tools and simulation frameworks to understand the behavior of new synthetic genetic devices to understand and predict their function.

Transposons vs. Viruses
 
Transposons
Viruses
What they are A transposon can only move or copy itself in one cell, and cannot spread from one cell to another without the use of other cellular machinery (for example, through conjugative transposons) or through further replication of the cell into future generations. Conversely, a virus can spread between cells using its own coding sequences to produce the necessary viral proteins and generally takes over the host cell's production and replication machinery to achieve its goals.
Technological Application Transposons are useful in studying loss-of-function assays where a transposon is inserted randomly into the genome and the cells are assayed for a loss of a particular ability, which can then provide important clues as to what genes and regulatory elements are necessary for that particular metabolic pathway or cellular function. Viruses are useful in many ways, to make libraries of mutants and to perform recombination analysis to determine gene order by analyzing the frequency of recombination when a virus carrying genomic DNA infects a host cell.
Design Limitations Limitations inherent in designing modified transposons are that it needs a transposase to move/copy itself, has a maximum size that it will recognize, may cause DNA damage when it jumps, and cannot move between cells. Viruses have similar limitations in requirements, needing many more proteins, and often have a relatively strict size of DNA they will package, often with only some replaceable regions, and can infect other cells, which can make them difficult to control under some circumstances.

  • Protein Structure Prediction
    • Protein structure prediction tools are used to determine the structure of a protein with an unknown structure, which is useful for many applications from metabolic pathway analysis to synthetic enzyme design, and they fall into the two categories of ab initio and Comparative. Ab initio prediction programs, such as Folding@home, attempt to determine the structure through fundamental physical constraints such as hydrophobicity and charge, without any other knowledge of the protein's structure. These methods can work on any protein but are very computationally intensive, and so far have not been able to predict structures with much accuracy due to the complexity of the problem, and due to the large number of possible configurations a protein could adopt. Comparative prediction programs, such as Swiss-Model, Robetta, or HHpred, instead attempt to find the structure of the protein by comparing it to homologous proteins with known structures (Homology modelling), or by trying to identify known secondary structure motifs and folds by matching against a database of solved structures (Protein threading); however, these methods only work on proteins with matches to the database of known structures or that share homology with other known proteins, although they are significantly less computationally intensive.
  • Gene Function
    • Gene Function programs attempt to determine the function of an unknown gene, and fall into two categories, those that use ontology databases to attempt to match annotations, or those that predict function by utilizing a biochemical pathway database. The gene ontology or EC number databases, such as The Gene Ontology project or ENZYME, contain a wide variety of annotated sequences and can be used to match an unknown protein (usually through sequence analysis) to an already annotated protein, which may have a similar or identical function. The biochemical pathway databases, such as KEGG or Gramene, can be used to try to fit the unknown protein into a known biochemical pathway to predict its function, and can also attempt to determine function if the unknown gene is in a similar sequence of genes all involved in one pathway.
  • Structure Analysis
    • This kind of software attempts to predict the structure of protein-protein or protein-ligand bound compounds and usually uses more fundamental approaches based on energy, hydrophobicity, and distance. Protein-protein docking software, such as Rosetta, GRAMM-X, or ClusPro, utilize a variety of methods to determine the optimal binding strucutre, as well as usually the strength of the composite, such as binding constants, and is useful for analysis of pathways and quaternary structures of protein complexes, but they tend to be computationally intensive. Protein-ligand binding software, such as Rosettaligand, Yucca, or AutoDockTools, instead seeks the best binding structure between a protein and a ligand in its active site, often also with its binding coefficient. This software is primarily used in the pharmaceutical industry for drug targeting and analysis, and also tends to be computationally intensive.
  • Sequence Analysis
    • Sequence analysis tools are useful in the analysis of a new genome or metagenomic data retrieved from the environment as they can assemble together multiple reads and predict the genes present on these sequences. Sequence assembly software, such as Celera, TIGR, or SeqAssem, will piece together multiple reads to form a contiguous sequence (contig) and are invaluable in genomic sequencing, and also tend to be relatively accurate and fast, but can be confused by metagenomic sequences. Gene finding software, such as Glimmer, GenScan, or GrailEXP, attempts to identify and predict the genes present in an unknown sequence and can often find regulatory elements, such as enhancers, promoters, and terminators. This software also tends to be accurate to an extent, but a combined approach utilizing many different programs and database analysis is usually best, which is still not usually too computationally intensive.
  • RNA Structure
    • RNA structure analysis is performed through folding and design programs, and conversely to protein structure folding, is much more accurate and can predict most two-dimensional structures relatively well for short and medium-length sequences, although many algorithms are still quite computationally intentisve. RNA folding software, such as mFold, UNA Fold, or Vienna, seeks to determine the structure an RNA molecule will adopt in a cell after transcription and can be very useful in determining the structure of RNA regulatory elements such as terminators and for understanding the function of RNA switches, ribozymes, and RNAi. RNA design software, such as siDirect or RNAsoft, is useful in creating new RNA regulatory elements, ribozymes, and siRNA sequences, is usually geared toward a particular application and may include an RNA folding algorithm as part of its function.

Links

Actions: Edit | Attach | New | Ref-By | Printable view | Raw view | Normal view | See diffs | Help | More...