Xgram Format Wish List

From Biowiki
Jump to: navigation, search

See also: Xgram Wish List (for feature suggestions not specific to the file format); Xgram Software.

Wishlist and suggestions for improvement of the xgram file format

  • Ticket:112 Syntax checking of the xgram file format should be more robust and verbose (moved to this page from xgram wish list):
    • It would be nice to have the line number where SExpr parsing goes wrong, such as due to missing parentheses.
    • It would be nice to have the line number for syntax error, in general.
    • The syntax of statements should be checked - I lost half an hour looking for the error, which was misspelling "transform" as "transfrom" - xgram should catch that, instead of silently dropping the production rule and proceeding as if all is well.
    • IH replies: My standard (evasive) response to points like this is that syntax checking is not "essential" functionality and can easily be put in an external program, c.f. lint. I think Andreas Heger has written something like this -- see the dart/python modules. I do recognize that errors such as mis-spelling can be very annoying, but there are ways of checking whether xrate has correctly read things in (try writing the file back out and see if contains everything you think it should). Unfortunately there are so many things that xrate simply can't do right now (and should be able to); syntax-checking is just not my top priority right now, though I can see your point, and this is probably yet another reason why I should've gone for XML :-( -- Ian Holmes 22 Sep 2006

Completed wishlist items

  • Wishes for write-up stuff: it would be nice if a couple of sentences were put into Xgram Format about identifier/label/parameter syntax... are they case-sensitive? What characters are allowed? (I see underscores are allowed, but what else?) Are there any reserved names? And so on... (Done, 5/18/2007 - IH)
  • More conspicuous parameter typing: (Done, 5/18/2007 - IH)
    • pgroup for a set of mutually exclusive probabilities
    • rate for a rate (and in general, syntax for differentiating between rate parameters and probability parameters should be explicit, instead of the current single-vs-double-parentheses syntax, which is not intuitive)
    • time for a branch length (this still isn't done, since there are currently no such parameters in xrate files... only rates and pgroups)
  • Parameter type modifiers:
    • const means don't update during EM ("const" can now be used as an alternative to "params"; 8/10/2006)
  • The double-parentheses syntax in params and const blocks should be completely dropped (see the "caveat" in the section that describes these blocks below). For example, what does (params ((alpha 1))) mean? It is either (1) a probability parameter list of length 1, which does not make sense because it would always be "co-normalized" to 1, or it is (2) a rate parameter defined using probability parameter syntax (one set of parentheses for the parameter, another set to enclose the single-member list, thus double parentheses), which is an exception to the general syntax and forces us to clarify that to people using the program. Neither interpretation is elegant. Done, 5/18/2007 - IH
  • Replace apostrophe with another character (e.g. asterisk) in xgram format files (apostrophe is reserved in Lisp & Scheme)
    • Done 4/30/2006; backward compatibility with apostrophes is retained -- Ian Holmes
    • Still need to update the example grammars in the dart/data directory... (now done)


-- Created by: Andrew Uzilov on 22 Aug 2006