| Topics |
Last 2 entries
Last 4 entries
Last 8 entries
Last 16 entries
Last 32 entries
Last 64 entries
Last 128 entries
Last 256 entries
| |
|
|
Google Summer Of Code 2007: project submissions due March 26th
Our Entries: Ajax Phylo-Informatics, Automated Pipeline Tools
NESCENT has a page up about Google's 2007 summer of code.
Here's my contribution so far (excerpted from the NESCENT phyloinformatics wiki):
Rationale
Many powerful new tools for phylogenetic stochastic grammar analysis of multiple alignments, such as xrate or PHAST, as well as PAML etc, are available only from the Unix command line. These tools need to become operable over the web, especially via Javascript platforms such as the new Google Maps-like interface to GBrowse.
Approach
Use toolkits such as dojo to build asynchronous javascript wrappers for Unix tools (probabilistic modeling & phlogeny tools, format conversion utilities, sequence analysis & alignment software, genome annotation pipelines, grids & job queues, realtime parallelizable systems); other Javascript/web components (alignment viewers, tree viewers & navigators, genome browsers); and bioinformatics "mashups". Interface with gmod-ajax, Amigo and other web-based bioinformatics platforms.
Challenges
Adapting command-line tools to for web use; creating an asynchronous user interface; developing infrastructure for mashable bioinformatics...
Involved toolkits or projects
BioPerl/Biopython/Bioruby; SWIG; dojo; Sun Grid Engine; Erlang
Mentors
IanHolmes, MitchSkinner, ChrisMungall, JasonStajich
Ideas pages
AjaxPhyloinformatics; see also WishList, RnaAlignmentViewer
Rationale
Annotating a genome, or performing other large-scale bioinformatics analyses, typically involves a series of operations with sequential dependencies but also strong parallelism. The GNU make program is one robust approach that is often used to build such analysis pipelines, but suffers serious drawbacks for bioinformatics (e.g. no built-in database access; extremely limited pattern-matching; language is not extensible; dependencies are triggered only by file timestamps and not e.g. MD5 hash indicating file contents have changed).
Approach
The project will involve building a replacement or upgrade to "make". One possible approach will be to use a declarative language with (i) strong support for distributed processing, (ii) easy-to-use Unix "hooks" (c.f. make), (iii) database and filesystem access. Examples of candidate languages include Erlang and Termite Scheme. Alternatively, C-inclined students may start with an existing parallel "make" clone, such as qmake or distmake.
Challenges
The first challenge is to get something that is as convenient to use as "make" for migrating throwaway command-lines and analysis scripts into robust pipeline stages. Subsequent challenges will include database access, flexible pattern-matching and enhanced dependency triggers.
Involved toolkits or projects
Erlang, Termite Scheme, distmake/qmake, or other.
Mentors
IanHolmes, ChrisMungall
Ideas pages
BioMake, ErlangLanguage
Some links from Google:
- Application instructions for mentorship organizations:
- Mentoring organization application checklist:
-- IanHolmes - 05 Mar 2007 |