click on the Biowiki logo to go to homepage
Edit Raw Print
Links Diffs RSS
About Stats Recent

Research | Teaching Blog
Biowiki > Blog > SummerOfCode

Search

Advanced search...

Topics

PageRank Checker

Google Summer Of Code 2007: project submissions due March 26th

Our Entries: Ajax Phylo-Informatics, Automated Pipeline Tools

NESCENT has a page up about Google's 2007 summer of code.

Here's my contribution so far (excerpted from the NESCENT phyloinformatics wiki):

Evolve Unix phyloinformatics tools into Ajax applications

Rationale

Many powerful new tools for phylogenetic stochastic grammar analysis of multiple alignments, such as xrate or PHAST, as well as PAML etc, are available only from the Unix command line. These tools need to become operable over the web, especially via Javascript platforms such as the new Google Maps-like interface to GBrowse.

Approach

Use toolkits such as dojo to build asynchronous javascript wrappers for Unix tools (probabilistic modeling & phlogeny tools, format conversion utilities, sequence analysis & alignment software, genome annotation pipelines, grids & job queues, realtime parallelizable systems); other Javascript/web components (alignment viewers, tree viewers & navigators, genome browsers); and bioinformatics "mashups". Interface with gmod-ajax, Amigo and other web-based bioinformatics platforms.

Challenges

Adapting command-line tools to for web use; creating an asynchronous user interface; developing infrastructure for mashable bioinformatics...

Involved toolkits or projects

BioPerl/Biopython/Bioruby; SWIG; dojo; Sun Grid Engine; Erlang

Mentors

IanHolmes, MitchSkinner, ChrisMungall, JasonStajich

Ideas pages

AjaxPhyloinformatics; see also WishList, RnaAlignmentViewer

Extending the "make" paradigm for bioinformatics annotation pipelines

Rationale

Annotating a genome, or performing other large-scale bioinformatics analyses, typically involves a series of operations with sequential dependencies but also strong parallelism. The GNU make program is one robust approach that is often used to build such analysis pipelines, but suffers serious drawbacks for bioinformatics (e.g. no built-in database access; extremely limited pattern-matching; language is not extensible; dependencies are triggered only by file timestamps and not e.g. MD5 hash indicating file contents have changed).

Approach

The project will involve building a replacement or upgrade to "make". One possible approach will be to use a declarative language with (i) strong support for distributed processing, (ii) easy-to-use Unix "hooks" (c.f. make), (iii) database and filesystem access. Examples of candidate languages include Erlang and Termite Scheme. Alternatively, C-inclined students may start with an existing parallel "make" clone, such as qmake or distmake.

Challenges

The first challenge is to get something that is as convenient to use as "make" for migrating throwaway command-lines and analysis scripts into robust pipeline stages. Subsequent challenges will include database access, flexible pattern-matching and enhanced dependency triggers.

Involved toolkits or projects

Erlang, Termite Scheme, distmake/qmake, or other.

Mentors

IanHolmes, ChrisMungall

Ideas pages

BioMake, ErlangLanguage

Some links from Google:

-- IanHolmes - 05 Mar 2007

Actions: Edit | Attach | New | Ref-By | Printable view | Raw view | Normal view | See diffs | Help | More...