Try to answer the following questions:
- Give values, citing and dating your sources (recent peer-reviewed papers, if you can find them), for the doubling time of (a) the size of Genbank, (b) the transistor density of an integrated circuit (Moore's Law), (c) network capacity (Butter's Law). Mention some implications of your results for management & analysis of biological data.
- Using this chart, or otherwise, estimate the halving time for the cost of random access memory.
- List three desirable properties of a useful biological database, apart from just storing data. (Look at the posted lecture notes if you get stuck)
- What is XML ? What is a flat file format ? Contrast the two.
- A relational database is structured around a set of tables . How do such tables differ from text files?
- What language is typically used to query relational databases?
- What is an accession number ? Give examples for GenBank and SwissProt.
- Databases can be classified as primary (containing "raw" experimental data), secondary (containing analysis or annotation of primary data), or composite (containing both primary data and secondary annotations). Describe the contents of the following databases and classify them as primary, secondary or composite: (a) Genbank, (b) PFAM, (c) the ArrayExpress archive, (d) CATH, (e) PDB.
- What role do controlled vocabularies (ontologies, taxonomies, etc) play in the biological sciences? Give five examples of domains of biological knowledge that use ontologies, including (for each one) an example of an ontology in common usage.
- Name two databases that contain evolutionary information.
- Describe three databases of biological pathways. Compare & contrast them. (Try to choose examples that are as diverse as possible)
- Name a file format for (a) sequences, (b) sequence alignments, (c) co-ordinates of genomic features. What information is contained in each format? What visualization tool(s) are commonly used to view each type of data?
- What is a genome database and what sorts of data does it typically subsume? Illustrate your answer with an example.
- Find examples of databases for the following types of data: (a) transcription factor binding sites, (b) experimentally observed protein-protein interactions, (c) histone modifications, (d) transposons and repetitive DNA. If you can, try to find the most widely-cited example of each. What are the licensing/usage terms for the databases you identified? (e.g. unrestricted usage; free for academics only; anyone can download; anyone can download as long as they register; etc.)
- Where in the peer-reviewed literature might you look for a list of current/recent biological databases?
- A recent model for data storage (emergent in 2006-7) is cloud storage . What is this? One bulk cloud storage provider is Amazon S3; a consumer-oriented re-seller is Dropbox. What services do these two companies offer and how do they differ? Describe some current uses, potential uses, and possible limitations of cloud storage in the management of biological data.
Copyright © 2008-2013 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback