click on the Biowiki logo to go to homepage
Edit Raw Print
Links Diffs RSS
About Stats Recent


Research Teaching Blog
Fall10 | Sandbox
Biowiki > Teaching > UndergraduateClassProject

Search

Advanced search...

Topics

PageRank Checker

[Back to UndergraduateClass main page]


Project description: Bacterial Genome Analysis, Bioe131/231 Fall 2005


In this project you will be asked to analyse an anonymous bacterial genome for clues about the biology of the organism.

The genome in question was sequenced by the Joint Genome Institute in Walnut Creek for one of their experimental collaborators. It has not yet been released to Genbank, EMBL or any other public database. In fact, we are not even going to tell you where the bacterium was sampled from! Therefore, this project accurately reflects the challenge often faced by computational biologists of parsing and interpreting a new, unknown genome sequence. With the rise of environmental sequencing (metagenomics), this task is likely to be increasingly relevant. (Check out a recent review article on metagenomics/environmental sequencing here.)

It is important to understand that the sequence data we will give you for this project represents a pre-published genome and cannot be distributed. As a condition of accepting this project, we ask that you agree not to share the genome with anyone else, or to publish results on it yourself, unless as part of a subsequent joint research project with JGI's biologist collaborators. This is both ethical and in the spirit of the Fort Lauderdale Agreement.

The first thing you need to do for this project is to email IanHolmes indicating that you and the rest of your team (if applicable) accept the above conditions. After we receive this email, we will send you a link to the genome data in the week beginning 11/14/2005. Once again, do not share this link with anyone except your team-members who have accepted the conditions!

You are free to analyze and comment on any aspect of this genome sequence. For assessment, we will be particularly interested in your ability to advance computationally-informed, testable hypotheses concerning the organism's biology; we will be particularly interested in inferences concerning the bacterium's environment and ecology. Some aspects of prokaryote biology that are readily testable include metabolism and catabolism (e.g. does this bug need substance X in its diet, or is it capable of synthesizing X on its own?), transport and uptake of ambient molecules, stress response and the transcription of specific mRNAs and proteins. The details of how you analyze these things computationally, or suggest experimental tests, are up to you, but here are a few specific areas you might investigate (whether you pick one or two of these and investigate them in depth, or attempt a broader holistic survey encompassing several of these points, is again up to you):

  • metabolism/catabolism/transport of nucleotides (purines and pyrimidines)
  • metabolism/catabolism/transport of amino acids (various different biochemical subgroups: e.g. aromatic, branched, sulfur-containing...) and other amines
  • metabolism/catabolism/transport of lipids, carbohydrates (e.g. sucrose, cellulose, starch...), antibiotics... any other environmental substances encountered by bacteria that you can think of...
  • oxidative stress, DNA repair, responses to other kinds of stress
  • aspects of the structure of the genome at the DNA and protein level (e.g. how many protein-coding genes? how long are they? how much low-complexity sequence? what kinds of repeats?)

We encourage you to form teams of 2-5 people to accomplish the goals of this project. Since a successful project will require both computational analyses and biological intuition, we strongly recommend that you form interdisciplinary teams containing people with complementary abilities. Any team can contain both graduates and undergraduates.

For example, a "dream team" might include:

  • one "biologist";
  • one "programmer";
  • one "editor" (a person who is good at technical writing and production of the final written report); and/or
  • one "presenter" (a person who will take charge of the final presentation and ensure it proceeds smoothly and is well-rehearsed). Note that a "presenter" should also be involved in other parts of the project (indeed, all of these roles are merely suggestive, and there is plenty of room for a "Jack-of-all-Trades").

The precise make-up of your team is up to you, but the point is that we will be specifically looking for a mix of biological insight, computational skills and good writing/presentation (see the assessment criteria below). We will be looking even harder for this synergy in team projects (versus solo projects). Therefore, if you want to team up, it is in your interest to form a diverse team and not just stick with other CS majors (if that is indeed what you are), or other bioengineers (etc).


Assessment and deadlines


Four milestones will be assessed toward the final grade for this class:

  1. Prepare a brief description of your objectives and the approach you plan to take (one or two paragraphs is about right), with an outline/timeline of planned project tasks (approx 10% of grade);
  2. Mid-project progress report (approx 10% of grade);
  3. Ten-minute final presentation (approx 30% of grade);
  4. Written report, including division of labor (approx 50% of grade).

Dates and assessment criteria for these milestones follow beneath.

Brief description of your approach and timeline (due 11/18/2005)

This should be a short, one to two paragraph description containing the biological questions you plan to explore as well as a high-level plan of your approach to address these questions (e.g. what kind of software do you plan on using? What kind of custom code will you need to write?) The purpose of this document is mainly for us to make sure you have some well-defined questions before you start digging through sequence/code and that your approach sounds reasonable and doable within the timeframe of the project. We expect that your team will discover new methods along the way, so the approach you propose is not set in stone.

This should be submitted by email, in plain text format, to AngiChau or IanHolmes by November 18th, 2005 with the text BIOE131 outline in the subject line, including a list of team-members (if you are working in a team).

Mid-project progress report (due 12/2/2005)

The mid-project progress report is due December 2nd, 2005. It should be three to five paragraphs and should describe the progress that has been made so far on the project, plans for further work and a revised timeline outlining the remaining project tasks.

The mid-project progress report should be emailed to AngiChau or IanHolmes in plain text format with the text BIOE131 progress report in the subject line.

Ten-minute presentation (12/5/2005 - 12/9/2005)

Strictly time-limited presentations to take place in class from December 5th to 9th, 2005. The presentation should be a self-contained summary of your project, with a focus on your results and conclusions. You may also want to talk about challenges your team faced, suggestions for future work, experimental tests to confirm any hypotheses you made, etc. However, do not just show your results - make sure you briefly describe your objectives and approach, so that your classmates will be able to understand your project as a whole.

Written project report (due 12/15/2005)

The final project report is due December 15th, 2005 by 11:59pm in PDF format. The general format of the report should follow that of a journal article and should be roughly 8-10 pages, single-spaced 12pt font:

  • Introduction/Background - Give background information to enable a scientific reader to understand the rest of your paper
  • Objectives - What did you set out to do? What were your goals?
  • Method - Details of what you did
  • Results
  • Discussion - How do you interpret your results? Are there any hypotheses and how might you test them? Future work?
  • References - Make sure to cite any literature and other references you may have used
  • (For teams of three or more people) Up to two pages of supplementary material, to be presented as an appendix to the main report, describing additional work that you wish to be taken into consideration

Assessment criteria will include:

  • Evidence of aptitude and initiative in applying computational methods to genome analysis;
  • Interpretation of results guided by biological intuition;
  • Clarity in written and verbal presentation of arguments and analysis;
  • Suggestions for relevant experiments in order to validate or test predictions.

Teams are encouraged to submit a single report, but it is very important that you clearly state, on the first page of the report, the attributions for the various aspects of the work (i.e. who did what on the project).

It should be emailed to AngiChau or IanHolmes with the text BIOE131 final report in the subject line.


Some useful resources/hints


Running BLAST locally on DECF machines

You may find it helpful in your project to run one of the BLAST programs for a large number of sequences. Instead of manually using the web interface for Main.BLAST on the NCBI website, you can actually run BLAST on the command line (or from a script!) on the DECF computers, since BLAST is installed locally there. There are many many options (similar to the online BLAST), so spend a little time figuring out what options you need for what you're trying to do. At the UNIX prompt:

$ blast

Typing just blast will show you a rather large help message telling you all the options. Note that you also need to figure out which BLAST program you want to run - these tables on the NCBI website might be helpful.

Other bioinformatics software

We've introduced you to some bioinformatics applications throughout this class and you're, of course, free to utilize any of them in your project. To find other software for specific purposes, you may want to search on Google, look for publications on Main.PubMed (bioinformatics group usually publish papers when they finish some significant piece of software), etc etc. (Angi: I also have a list of some bioinformatics links I've collected over the years and you guys can take a look at this list if you're interested).

If you find some interesting piece of software that you think might be useful to other students, please let Angi know and we'll try to keep a running list to benefit the whole class.

Actions: Edit | Attach | New | Ref-By | Printable view | Raw view | Normal view | See diffs | Help | More...