|
| Bio E 131 Final Project 2009
Your task is to set up and deploy a website, including a genome
browser, for an RNA virus genome. The purpose of this website is to be
an educational and/or research resource that serves as a portal to
additional investigation of the virus, incorporating robust links to
external databases, review information, and/or novel analysis.
Each student is assigned one virus; you may, at your option, team up
with other students, to a maximum team size of 4. The teams are in
fact just one organizational mechanism for collaborating; since every
student is assigned a different virus, and since (when grading) we
will reward efforts that are seen to benefit everyone's project,
sharing of tools and data between teams is actively encouraged. The
assigned viruses occur in phylogenetic clades and it is suggested, but
not required, that the maximum benefit of teams can be derived when
teams contain students whose assigned viruses phylogenetically
similar.
There is a great deal of flexibility in how you execute your project,
since the basic project requirement (set up a JBrowse genome browser
for your virus) can and should be extended in a number of different
and complementary ways, outlined below.
Students will make presentations on their projects during RRR week,
following which there will be a period of peer review. During the peer
review period, each student will anonymously review and rank a minimum
of five projects outside their own team. These rankings will be used
extensively (albeit not exclusively) to grade the projects. The
mandatory five projects that each student will review, will be
assigned later during the project period. Students may rank more than
five other projects; this is one of several ways to accrue extra
credit.
The presentations will not be graded directly but should be viewed as
an opportunity to advertise the strengths of the project to peer
reviewers, so as to maximize their rankings of your work.
To find out what your virus is, go here: Project Assignments
The basic project
The project involves developing a website, or more specifically, a
collection of interlinked HTML files together with JavaScript files
allowing your viral genome annotations to be viewed in a web browser.
As noted, you will be using JBrowse for the genome browser component;
one reason for this is that JBrowse does not require any code to be
run dynamically on the server when a client requests a
page. Therefore, you can test your website by creating static files
and directories on the DECF computers and loading these files into
Firefox.
The minimum requirement for your project is to
- select a representative genome for your assigned species of virus
- obtain FASTA-format sequence and GFF-format gene annotations
- this may involve writing/finding/running scripts to convert from other formats, e.g. Genbank
- run the JBrowse preparation scripts on these files
- test that the JBrowse browser displays in Firefox (or Safari/IE)
Completion of these requirements will amount roughly to a basic C grade.
This grade can be improved by implementing the extensions below, and/or by
contributing collaborative resources (tutorial pages, Perl scripts,
general pages collecting links/data about genes shared with other
viruses, etc.) that other class members can use to improve their projects.
Mitch Skinner, developer of JBrowse, will be available in 381 Stanley on Tuesdays from 3-5pm to answer questions about JBrowse setup and configuration.
Extensions
Extensions are very open-ended, within the general framework of providing a web-based resource for browsing viral genome annotations.
Some ideas are as follows:
- Incorporating links to other databases and resources, either from your web pages, or from JBrowse itself (JBrowse can incorporate outgoing links from the gene features, or via intermediate "link pages" that you can create, if e.g. you have more than one outgoing link from each gene feature). For example....
- Pubmed citations
- PDB (Protein Data Bank)
- Wikipedia (page for your virus, or for genes in your virus)
- Genbank
- The RNA Virus Database in Oxford (this is an excellent source of data on many of the assigned viruses, though it will require some SQL experience in your team)
- Databases, websites or publications specific to your virus
- This is by no means a complete list! Credit will be given for useful and consistent cross-linking.
- Performing bioinformatics analyses and incorporating the results of these, or other annotation data, into your genome browser, e.g.
- Running homology searches against Swissprot (for protein matches), Interpro (for protein domain profile matches), Rfam (for RNA family profile matches), etc.
- Aligning your genome to closely-related genomes (e.g. other students' assignments...) and incorporating tracks/statistics that summarize these alignments into your JBrowse view (e.g. plotting a column-by-column conservation score as a WIGgle track in JBrowse, using whatever particular conservation scoring metric you deem appropriate)
- Annotating signals in the genome that have been discussed in the literature or predicted by your own analyses, e.g. RNA structures that are conserved or relevant to function, transcription factor or other binding sites, protein active sites, packaging or replication signals, etc.
- Identifying other features of interest that you are able to find (either by your own analysis or by examining the literature), e.g. mutational or recombinational hot-spots, drug resistance mutations, etc.
- Adding tracks showing the GC content, information content, or other sequence statistics
- Writing overview/summary pages discussing any of the following issues at a molecular level, with reference to your genome:
- Biology, morphology, pathology, evolution of your virus
- Engineering applications or modifications of the virus
- Clinical/therapeutic considerations (e.g. drug resistance)
- Comparisons to related or other viruses (e.g. shared genes, different genes, structural or other similarities)
An A grade would likely require at least one extension in two and possibly three of the
above categories, with the top grades going to projects that also win consistently good peer rankings.
Consistency and coherence will be valued in these extensions. For
example, if your overview page discusses drug resistance mutations and
your genome browser includes a track for drug resistance mutations,
then this would be seen as a plus.
If you also incorporated links to a database of mutation phenotypes for this virus,
this would be a very strong plus since you would then be spanning all three categories.
Incentives for collaboration
The project is intended to emphasize several features of real science:
collaboration (and the development of collaborative tools), peer
review, and working with real data.
The collaborative aspect is particularly emphasized. The collective goal
of this project should be viewed as developing a web-browsable
database of RNA virus genome annotations. No two people are assigned
the exact same virus and so there will be considerable benefits to
collaboration, including collaboration between teams as well as
collaboration within teams.
While there is a ranking aspect to the grading scheme, it is by no
means a zero-sum game. If the collective output of the class exceeds
expectations, then more high grades will be awarded than would
otherwise be the case. This is designed to provide an incentive for
development of community resources, and contribution to such resources
may boost individual grades.
Of course, the model we are copying here is the scientific
reputation economy.
Scientific culture encourages you to release your secrets, rather than hoard them,
because in doing so you will accrue credit in the form of reputation and citations.
This culture has evolved and thrived because it creates an incentive structure that benefits humanity as a whole, while recognizing individual contributions.
Of course, here we are explicitly attempting to incentivize collaboration by including
this as a factor in your final grade... the principle is the same, though (we want to leverage the process to create a kick-ass collective project).
As an example, you are asked (as part of the basic project) to set up
the JBrowse genome browser. This is a new and experimental genome
browser and while some documentation and tutorial information does
exist for this browser, it is by no means comprehensive (yet), nor are
there any tutorials explicitly aimed at the level of this class. An
example of how collaborative contributions might improve your grade
would be if you wrote up your experiences with JBrowse as a tutorial
wiki page at an early stage during the project, making this tutorial
available to other class members, and (for the Win!) encouraging other
students to contribute and improve your tutorial page. This is not
meant to be a prescriptive example; there are many other similar
options (e.g. starting a mailing list for class discussions of
JBrowse, or helping answer questions on the existing gmod-ajax mailing
list for JBrowse). The point is that we (the graders) will be actively
looking for such examples of contributions that not only enhance one
particular project, but enhance the collective output of the class as
a whole. This is entirely compatible with individual grades;
essentially, it reflects the "reputation economy" found in large-scale
academic research consortia.
A few examples of good collaborative contributions:
- Early stages
- Setting up a central installation of JBrowse and its pre-reqs (e.g. Bioperl) that other teams can use
- Development or distribution of a useful Perl script for converting file formats
- Documentation or creation of tutorial pages
- Later stages
- Contributions to external user-curated resources such as Wikipedia (positive and successful contributions to Wikipedia will be held as VERY positive)
- Providing early & detailed feedback on other student projects, and/or ranking more than five other projects
- Overall
- Developing common components that facilitate consistency between different team entries (e.g. cascading style sheets, common data sets or gene pages, etc.)
- Taking a leadership role in organizing/co-ordinating collaborations between teams
- Setting yourself up as a "service provider" to perform a particular specialized analysis for everyone in the class (e.g. running Rfam searches, doing multiple alignments, configuring or even running JBrowse, etc.)
Formal deliverables
The following are the deliverables that you must submit for grading.
- At an early stage (see timeline) each team will be required to submit team names and member lists, which we will use to avoid conflicts of interest when assigning reviewers.
- The address of the "landing page" for your website. This will likely be a path to a file on the DECF accounts (although you are free to host your pages as a website on a webserver if you have access to one).
- The path MUST be accessible to other students, either locally (on the DECF computers) or globally (over the web), so that other students can review your site. Make sure you test this!
- Zipfile or tarball representing a snapshot of your website, submitted via bSpace
- This does not need to include portions that are posted on biowiki.org, wikipedia, etc., as long as those portions are directly linked to from your website
- You may keep updating/modifying your site during the review period, but we would like a snapshot for our records
- Statement of individual member contributions, including ...
- contributions to external sites (Wikipedia, biowiki.org, etc.)
- collaborative tools or resources developed (tutorials, Perl scripts, etc.)
- Presentation by your team (5 minutes per team member)
- Rankings of assigned projects for peer review (optionally accompanied by more detailed critiques)
Timeline
| March 2010 |
| |
01 |
02 |
03 |
04 |
05 |
06 |
| 07 |
08 |
09 |
10 |
11 |
12 |
13 |
| 14 |
15 |
16 |
17 |
18 |
19 |
20 |
| 21 |
22 |
23 |
24 |
25 |
26 |
27 |
| 28 |
29 |
30 |
31 |
|
|
|
|
| April 2010 |
| |
|
|
|
01 |
02 |
03 |
| 04 |
05 |
06 |
07 |
08 |
09 |
10 |
| 11 |
12 |
13 |
14 |
15 |
16 |
17 |
| 18 |
19 |
20 |
21 |
22 |
23 |
24 |
| 25 |
26 |
27 |
28 |
29 |
30 |
|
|
- 20 Nov - final project announced; virus Project Assignments made
- 25 Nov - final project team names & membership lists due
- 4 Dec - final project peer-review viruses assigned
- 7 Dec - final project presentations
- 9 Dec - final project presentations
- 10 Dec - all submitted final project materials due (paths, zipfiles, statements)
- 17 Dec - all final project peer rankings due
|