Tags:
create new tag
,
view all tags
---++ Telegraph %TOC% ---+++ Brief overview Telegraph is an (unfinished) project to implement (in ANSI C and [[http://www.gnu.org/software/bison/][bison]]) a compiler to generate dynamic programming algorithms for multi-tape StochasticContextFreeGrammars (SCFGs), also implemented in ANSI C. ---+++ Introduction Phylogenetic alignment is a powerful tool used for inferring phlogenetic trees from multiple sequence alignments using probabilistic methods. In particular, this type of alignment can identify the parts of the sequences that have been evolutionarily conserved, and mark these as common between species. An important application of phylogenetic alignment is the alignment of viruses, HIV for example. AIDS is caused by two viruses HIV-1, the cause of the dangerous pandemic, and HIV-2, the less widespread and damaging of the two. Moreover, the Simian Immunode ciency Virus (SIV), a related virus has been detected in monkeys and apes. Phylogenetic align- ment can be used to identify the \important" parts of the virus that can then be used in vaccine developement. RNA secondary structure prediction is another tool that can be used to study biological function. DNA is stable enough to study, although it doesn't do much other than store information really well. In contrast, protein has the interesting quality of being directly related to function, but the added complexity of a twenty amino acid alphabet in proteins makes it di cult to work with on an information processing level. RNA, however, is both stable and interesting with its four bases and folding properties. In particular, the base-pairings defining the secondary structure are commonly more conserved between RNA than their sequences (Durbin & Eddy). <div align="center"> <br /> <img src="%ATTACHURLPATH%/path1340.png" alt="path1340.png" width="247" height="71" /> </div> <br> An ideal method of modelling RNA secondary structure makes use of context free grammars. RNA secondary structure is often palindromic with lengths of nested base pairings. It often comprises short base-paried stems as RNA is usually a single strand that folds onto itself. For the sequence above, we can infer this RNA stem loop structure, shown in two-dimensional form: <br /> <div align="center"> <img src="%ATTACHURLPATH%/path1449.png" alt="path1449.png" width="120" height="239" /> </div> <br> To generate this palindromic structure, we can create a grammar that assigns high probabilities to rules that emit these canonical A-U, G-C base pairings. This grammar captures the nesting tendencies of RNA secondary structure. An example: <br> <div align="center"> <br /> <img src="%ATTACHURLPATH%/text1291.png" alt="path1340.png" width="76" height="45" /> </div> <br> Telegraph is language designed to specify rules for a stochastic context-free grammar--grammars that allow us to probabilistically model RNA secondary structure. We are building a compiler that will ultimately be able to compile a file comprising a set of alphabets and a set of context free production rules associated with a probability. Using these production rules, the compiler generates a set of parse trees, each of which is given a cumulative probability determined by the order of production rules yielding the particular parse. While the rules may be written in any form, such as Chomsky normal form, the compiler will convert them into RNA Normal Form, which is intended to favor emissions over bifurcation. RNA Normal Form follows the format 1 L-> R 1 L-> RT 1 L-> x1 y1 R x2 y2 where (1) is a transition from nonterminal L to nonterminal R, (2) is a bifurcation from nonterminal L to nonterminals R and T, and (3) is a transition from nonterminal L to a nonterminal R flanked by at most one terminal from each alphabet, x and y in this example. In the parse tree, the root nodes are nonterminals, and the leaf node correspond to terminals of the alphabets. The tree with the highest score would thus represent the most probable parse of the sequence. ---+++ Source code Telegraph source code is bundled with the DART library. For the Telegraph compiler start e.g. here: DartRepository:tgcompiler.h (IanHolmes) ---+++ Project status * [[%ATTACHURL%/tele.pdf][tele.pdf]]: Telegraph: What has been done and what still needs to be finished ---+++ Authorship --- Last update: June 16, 2005 SharonChao
E
dit
|
A
ttach
|
P
rint version
|
H
istory
: r52
<
r51
<
r50
<
r49
<
r48
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r52 - 2006-09-24 -
IanHolmes
Home
Site map
Blog web
Fall07 web
Fall08 web
Fall09 web
Fall10 web
Fall11 web
Fly web
JBrowse web
Main web
orig web
Research web
Sandbox web
TWiki web
orig web
Teaching web
ZaboArt web
Main Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
Register
Copyright © 2008-2013 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback
TWiki Appliance
- Powered by
TurnKey Linux