- Name: chris mungall
- Login Name: cjm
- Email: email@example.com
I am interested in Genome Annotation, Evolution Of Genomes and relating knowledge of model organism biology (in particular Phenotype Data) to understanding human diseases. I believe that our understanding of these things can be advanced through the use of techniques from Computer Science, including advanced database and modeling techniques such as Formal Ontology and Declarative Languages. I also believe that Relational Database technology is not being used to its fullest potential in Bio Informatics. This wiki page is a way for me to informally riff of some of these things.
My previous work at Berkeley includes Gad Fly (Genome Annotation database of the fly) and the Gad Fly Pipeline. Gad Fly was the precursor of the Chado Database (part of the Generic Model Organism Database project). At the same time I developed the Gene Ontology Database and associated perl modules and API. Currently I am doing a lot of research on enhacing the Gene Ontology and Open Bio Ontologies, and working with Mark Yandell on looking at intron evolution.
Decent Genome Annotation is essential for making sense of the flood of sequence data. This means both quality data and the correct representational formalisms for capturing that data. There is a lot more to a genome than protein coding genes, and there is a lot more to the standard protein coding gene than the central dogma (DNA makes RNA makes protein, end of story). Biology is full of surprises that confound our rigid data models.
I had the benefit of working with some extremely knowledgeable biologist curators during at Berkeley during the annotation of the fruitfly genome. Amongst other things, this taught me that it will be a long time before automated methods can approximate the correctness and detail of human annotation; and that current automated methods can be improved tremendously to aid manual annotation.
There are a plethora of Data Models for representing Genome Annotations, most lack formal rigour, and there is little in the way of declarative mappings between them. There are many black-box parsers that purport to convert between these, but the inner workings are only known to their programmers. Many formats such as GFF are lossy, which is not problematic for some of today's applications, but make them unsuitable as a general purpose solution. It's all a bit of a mess. Thankfully the Sequence Ontology will go some way to addressing issues of interoperability; although SO still allows multiple representantions and differing conceptualisations of things such as locations.
I have contributed to this plethora both with the Chado Database and the semantically equivalent Chaos XML and Chado XML Data Models. I hope that the use of formal principles underlying these models will help manage the crisis of Genome Annotation representation.
- Bio Ontologies
- Change Password
- TWiki. has site-level preferences of TWiki.
- has preferences of the TWiki.Main web.
- Main. has a list of other TWiki users.