|
|
Lactic acid bacteria
Table of Contents:
Background
Lactic acid bacteria are benign microbes involved in fermentation processes associated with the production of
wine, salami, cheese, sourdough bread, pickles, yogurt, cocoa, coffee and other foods.
In the 10/17/2006 issue of PNAS,
JGI scientists and collaborators reported the initial sequencing and analysis of nine new lactobacteria genomes.
The final projects for the 2006 !Teaching.BioE131 class build on several areas highlighted by the JGI analysis.
In addressing these projects, you can make use of the published annotations, or you can attempt to start from first principles and analyze the genomic DNA directly.
Teams who make use of published annotations will, however, face a higher expectation of the breadth of analysis possible:
running gene predictors takes time, so if you skip this step, we expect you to achieve slightly more in the downstream analyses.
Further background on the lactobacteria can be found in the list of references.
Bioe131 Project Rules
In this project you will be asked to analyse a bacterial genome (or genomes) for clues about the biology of the organism.
The genome in question was sequenced by the Joint Genome Institute in Walnut Creek for one of their experimental collaborators.
This project reflects the challenge often faced by computational biologists of parsing and interpreting a new, unknown genome sequence.
With the rise of environmental sequencing (metagenomics), such tasks are likely to be increasingly relevant. (Check out a recent review article on metagenomics/environmental sequencing here.)
For assessment, we will be particularly interested in your ability to advance computationally-informed, testable hypotheses concerning the organism's biology;
we will be particularly interested in inferences concerning the bacterium's environment and ecology.
Some aspects of prokaryote biology that are readily testable include metabolism and catabolism
(e.g. does this bug need substance X in its diet, or is it capable of synthesizing X on its own?),
transport and uptake of ambient molecules, stress response
and the transcription of specific mRNAs and proteins.
The details of how you analyze these things computationally, or suggest experimental tests, are up to you,
but the list of suggested projects includes a few specific areas you might investigate.
Whether you pick one or two of these and investigate them in depth,
or attempt a broader holistic survey encompassing several of these points, is again up to you.
Examples of aspects of microbial biology that are often interesting include
- metabolism/catabolism/transport of nucleotides (purines and pyrimidines)
- metabolism/catabolism/transport of amino acids (various different biochemical subgroups: e.g. aromatic, branched, sulfur-containing...) and other amines
- metabolism/catabolism/transport of lipids, carbohydrates (e.g. sucrose, cellulose, starch...), antibiotics... any other environmental substances encountered by bacteria that you can think of...
- oxidative stress, DNA repair, responses to other kinds of stress
- aspects of the structure of the genome at the DNA and protein level (e.g. how many protein-coding genes? how long are they? how much low-complexity sequence? what kinds of repeats?)
We encourage you to form teams of 2-5 people to accomplish the goals of this project.
Since a successful project will require both computational analyses and biological intuition,
we strongly recommend that you form interdisciplinary teams containing people with complementary abilities.
Any team can contain both graduates and undergraduates.
For example, a "dream team" might include:
- one "biologist";
- one "programmer";
- one "editor" (a person who is good at technical writing and production of the final written report); and/or
- one "presenter" (a person who will take charge of the final presentation and ensure it proceeds smoothly and is well-rehearsed). Note that a "presenter" should also be involved in other parts of the project (indeed, all of these roles are merely suggestive, and there is plenty of room for a "Jack-of-all-Trades").
The precise make-up of your team is up to you, but the point is that we will be specifically looking for a mix of biological insight, computational skills and good writing/presentation
(see the assessment criteria below).
We will be looking even harder for this synergy in team projects (versus solo projects).
Therefore, if you want to team up, it is in your interest to form a diverse team and not just stick with other CS majors (if that is indeed what you are), or other bioengineers (etc).
A word on negative results
We want to emphasize that all of the final projects in this class are effectively research projects.
That is, the underlying hypotheses are untested; consequently, one cannot predict or guarantee that projects will give spectacularly positive results.
Please don't be discouraged if your experiments do not confirm the hypotheses laid forth in the project descriptions,
or give results that are inconclusive.
That is often the nature of research.
It is OK to present negative results, as long as your reasoning and methods are sound!
We are mostly interested in seeing you
(i) use the tools and techniques that you've learned in the class sensibly, thoughtfully & creatively;
(ii) design and implement your experiments with an appreciation of the underlying biological, statistical, physical & computational principles; and
(iii) interpret your results carefully and insightfully.
Tool-sharing incentive
To encourage CS-minded students to share any cool Perl scripts or other tools that they create,
we offer a special credit for students/teams who post their tools publicly (e.g. on the bspace discussion site)
during the period of the project.
Assessment and deadlines
Four milestones will be assessed toward the final grade for this class:
- Prepare a brief description of your objectives and the approach you plan to take (one or two paragraphs is about right), with an outline/timeline of planned project tasks (approx 10% of grade);
- Mid-project progress report (approx 10% of grade);
- Ten-minute final presentation (approx 30% of grade);
- Written report, including division of labor (approx 50% of grade).
Dates and assessment criteria for these milestones follow beneath.
Brief description of your approach and timeline (due 11/22/2006)
This should be a very short, one to two paragraph description of the biological questions you plan to explore as well as a high-level plan of your approach to address these questions, fleshing out the general outline given on this page (e.g. what kind of software do you plan on using? What kind of custom code will you need to write, if any?) The purpose of this report is mainly to ensure that (i) you have a team, (ii) you have some well-defined questions and (iii) your approach sounds reasonable within the timeframe of the project, before you start digging through data and/or code. We expect that your team will discover new methods along the way, so the approach you propose is not an irrevocable commitment that is then set in stone forever; it is allowed to change.
This should be submitted by email, in plain text format, to IanHolmes by November 22nd, 2006 with the text BIOE131 outline in the subject line,
including a list of team-members (if you are working in a team).
Mid-project progress report (due 11/29/2006)
The mid-project progress report is due November 29th, 2006.
It should be three to five paragraphs and should describe the progress that has been made so far on the project, plans for further work
and a revised timeline outlining the remaining project tasks.
The mid-project progress report should be emailed to YuriBendana or IanHolmes in plain text format with the text BIOE131 progress report in the subject line.
Ten-minute presentation (12/14/2006)
Strictly time-limited presentations to take place during the final exam period, 12:30-3:30pm, December 14th, 2006. (If there are more than 9 teams then some presentations may be held on the last day of class, 12/8/2006.) The presentation should be a self-contained summary of your project, with a focus on your results and conclusions. You may also want to talk about challenges your team faced, suggestions for future work, experimental tests to confirm any hypotheses you made, etc. However, do not just show your results - make sure you briefly describe your objectives and approach, so that your classmates will be able to understand your project as a whole.
Written project report (due 12/14/2006)
The final project report is due December 14th, 2006 by 11:59pm in PDF format. The general format of the report should follow that of a journal article and should be at most 10 pages, single-spaced 12pt font:
- Introduction/Background - Give background information to enable a scientific reader to understand the rest of your paper
- Objectives - What did you set out to do? What were your goals?
- Method - Details of what you did
- Results
- Discussion - How do you interpret your results? Are there any hypotheses and how might you test them? Future work?
- References - Make sure to cite any literature and other references you may have used
- (For teams of three or more people) Up to two pages of supplementary material, to be presented as an appendix to the main report, describing additional work that you wish to be taken into consideration
Assessment criteria will include:
- Evidence of aptitude and initiative in applying computational methods to genome analysis;
- Interpretation of results guided by biological intuition;
- Clarity in written and verbal presentation of arguments and analysis;
- Suggestions for relevant experiments in order to validate or test predictions.
Teams are encouraged to submit a single report,
but it is very important that you clearly state, on the first page of the report,
the attributions for the various aspects of the work
(i.e. who did what on the project).
It should be emailed to IanHolmes with the text BIOE131 final report in the subject line.
Bioe131 Project Topics
There follows a list of suggested projects.
Survey of auxotrophic requirements as revealed by genomic/systems analysis
JGI et al's 2006 PNAS paper states that the lactic acid bacteria,
being adapted to life in a nutritionally rich medium,
are auxotrophic (as opposed to "prototrophic") -- that is, they lack genes for biosynthesis of some
factors (amino acids, nucleotides, vitamins...) which are compensated for by genes involved
in the uptake and metabolism of environmental nutrients (e.g. transporters, peptidases).
For this project, you should investigate auxotrophy in one or more of the JGI-sequenced bacteria.
Investigate the absence (or presence) of genes associated with biosynthetic pathways (you choose which pathways to investigate).
You should make predictions about which pathways will be functional in your bacterium, based on the presence or absence of genes in those pathways.
Compare to a prototrophic microbe such as E.coli or B.subtilis.
There are multiple ways you could go about this task.
For example, you could start from the raw genomic DNA sequence for your chosen bacterium, and predict genes (as we have done in class).
Alternatively, you could start from the set of protein-coding genes annotated to that genome in Genbank,
then use tools such as Pfam, on the web or from a Unix command line, to search for the presence of key domains.
As a third option, you could go directly a web portal (such as JGI's IMG).
You may also find it helpful to use ontology tools, such as the GeneOntology and/or cog2go.
Evolution of the enolase protein
Another prediction of the 2006 PNAS paper is that an enolase protein in the lactobacteria has undergone accelerated evolution.
The purpose of this project is to characterize this accelerated evolution at the amino acid level, using sequence alignment and phylogenetic methods.
Time (and available data in PDB) permitting, you should also investigate the distribution of accelerated sites on the three-dimensional structure of the enolase protein,
and comment on any discernible patterns (e.g. clustering of accelerated amino acid positions near the active site).
The methods you might use include molecular evolutionary/phylogenetic analysis and "evolutionary trace" or related methods (e.g. JEvTrace).
If you wish, you may also investigate the evolution of other proteins in the fermentation pathways of these microbes.
Here are some starting points for data retrieval:
- Lactococcus lactis enolases
- Enolase structures in PDB
Gene phylogenies for information and fermentation pathways
According to the 2006 PNAS paper,
the phylogenetic classification of the lactobacteria "remains an unresolved issue in particular because phenotypic classification, which is traditionally based on the type of fermentation, does not match the rRNA-based phylogeny".
The paper goes on to describe a phylogeny that the authors constructed using protein sequences involved in the basic information-processing pathways
(transcription and translation).
These protein alignments were concatenated to maximize the amount of information available to the phylogenetic reconstruction tools.
The idea of this project is to consider a similar phylogenetic analysis of the core genes involved in the fermentation pathways of these microbes,
rather than the information-processing pathways.
You should first establish the extent to which this is feasible:
Is there a core set of fermentation genes in all the lactobacteria?
Is it possible to find plausible alignments of these protein sequences (i.e. do they appear to be related)?
To the extent afforded by the data, you should then attempt to reconstruct a phylogeny encompassing all sequenced lactobacteria,
or phylogenies for the subsets of lactobacteria containing common sets of genes involved in the fermentation pathways.
You can use common programs for alignment (e.g. CLUSTAL, MUSCLE) and phylogenetic reconstruction (e.g. weighbor, PHYLIP) to do this.
Compare the phylogenies so obtained to the phylogeny given in the PNAS paper.
References
For the general reader
College-level introductions
Research papers
Sequences
It's unlikely that you'll need the plasmid sequences, only the whole genomes, but the plasmids are included for completeness.
From the 2006 PNAS paper
- Genbank:NC_008531 Leuconostoc mesenteroides subsp. mesenteroides ATCC 8293, complete genome
- Genbank:NC_008496 Leuconostoc mesenteroides subsp. mesenteroides ATCC 8293 plasmid pLEUM1, complete sequence
- Genbank:NC_008529 Lactobacillus delbrueckii subsp. bulgaricus ATCC BAA-365, complete genome
Previously sequenced
- Genbank:NC_007929 Lactobacillus salivarius subsp. salivarius UCC118, complete genome
- Genbank:NC_006530 Lactobacillus salivarius subsp. salivarius UCC118 plasmid pSF118-44, complete sequence
- Genbank:NC_006529 Lactobacillus salivarius subsp. salivarius UCC118 plasmid pSF118-20, complete sequence
- Genbank:NC_007930 Lactobacillus salivarius subsp. salivarius UCC118 plasmid pMP118, complete sequence
- Genbank:NC_008054 Lactobacillus delbrueckii subsp. bulgaricus ATCC 11842, complete genome
Other resources
|