Home - this site is powered by TWiki(R)
JBrowse > JsonFormatStabilization
TWiki webs: Main | TWiki | Sandbox   Log In or Register

Changes | Index | Search | Go

Goal:

Fix current sub-optimal aspects of the JSON format, allowing us to:
  1. support it into the future, and
  2. start encouraging other people to generate it

Current Issues:

only 2 levels of nesting possible

There are currently separate header arrays for the first level features and their immediate subfeatures. This doesn't (currently) allow for specifying the headers for other levels. Many possible ways of addressing this:
  1. make all levels the same
    1. cost: extra empty fields in feature arrays - currently, top level usually doesn't have a feature type field, but subfeatures usually do; also, if top-level feature is rich, subfeatures can be much leaner
  2. make all subfeatures have the same structure, even if there are multiple levels of nesting. Keep feature structure different between top-level features and subfeatures.
    1. cost: potential empty fields in feature arrays, if different levels of subfeatures have different structure (how likely?)
  3. add a "class" tag to each feature: have each feature specify which set of fields it has, and separately have an array of "classes" that specify those sets of fields.
    1. cost: if <= 10 different classes (should almost always be the case), two bytes per feature for a single-digit number and a comma
    2. benefits:
      1. allows us to specify a "lazy feature" class with just 3 fields ["start", "end", "chunk"] and get rid of the {"chunk": "300"} business currently in the JSON for lazy features, which should slim down the trackData.json a fair amount (how much do we win after compression?).
      2. allows us to remove some of the "lazyIndex" special-case code in JsonGenerator and NCList.js.
      3. more generally, makes the JSON significantly more flexible

likely-to-change client config stuff is mixed in with the less-likely-to-change feature JSON

Should move client config stuff up into a higher-level file, either trackInfo.js or a new, separate trackConfigs.js.

required/optional fields in features aren't documented

JBrowse needs a numeric "start" and "end"; JBrowse will do useful things with "strand" and "name" fields if they're in the JSON, and the default feature click callback uses the "id" field if it's there. We should make a list of required/optional fields, and make a rule for reserving field names for JBrowse use (like the GFF3 rule that uppercase names are reserved; we should probably just adopt the same rule). I'd want to reserve "Sublist" and "Chunk" or "LazyChunk" or something for NCList operation; maybe "UniqueID" as well for something that guaranteed to be unique in some context (e.g., the chado guarantee that a uniquename is unique across a type/organism).

trackData for FeatureTracks:

{
    formatVersion: 1,
    featureCount: <# of features>,
    intervals: {
        classes: array of "class" descriptors, e.g. 
                 [
                   {
                     "name" (should be optional IMO): unique name for this "class", intended for human consumption
                     "attributes" (required): e.g.: ["Start", "End", "Strand", "Subfeatures"],
                         like GFF3, we can reserve all upper-case attributes.  Attributes the client knows about:
                         "Start", "End", "Strand", "Name", "ID"?, "Subfeatures", "Sublist", "Type", "Chunk", "Phase", "Score"?
                         (should we have a controlled vocabulary for these?)
                     "proto" (optional): attribute-value mappings that are the same for all features tagged with this class, e.g.:
                         {"Chrom": "chr1", "Type": "match_pair"},
                     "isArrayAttr": an object with an {<attr name>: true} mapping
                         for each attribute that is meant to be an array, e.g.: {"Subfeatures": true}
                         This is needed because features are represented with arrays, so generic
                         client code needs a way to differentiate arrays that are meant to be features
                         from arrays that are meant to be actual arrays.
                         (optional; if omitted, none of this class' attributes will be treated as arrays)
                   },
                   {
                     "attributes": ["Start", "End"],
                     "proto": {"Chrom": "chr1", "Type": "match"},
                   },
                   {
                     "attributes": ["Start", "End", "Chunk"],
                     "proto": {"Chrom": "chr1"},
                     "isLazy": true
                   }
                 ],
        lazyClass: index of the class used to tag "lazy" features
        nclist (optional, either this or "chunkBases" is required): nclist of feature arrays;
            sublists are held in a "Sublist" attribute, e.g.:
            [[0, 10000, 20000, -1, [[1, 10000, 12000, "UTR"], [1, 13000, 15000, "CDS]]], {"Sublist": [0, 17000, 18000, 1, []]}]
            The zeroth position in each feature array indicates the class for that feature
        urlTemplate: for "fake" features, the JSON file containing the subtree covered by that feature
            should be at this URL, once the "fake" feature attributes have been substituted in, e.g.:
            "lazyfeatures-{Chunk}.jsonz", or "/foo?chr={Chrom}&start={Start}&end={End}"
        uniqueIdTemplate (optional): for pre-generated JSON, this will be omitted, because the path
            through the NCList is used as the unique ID.  But for on-the-fly
            generated JSON, this will be a template for specifying a unique ID
            in terms of the feature attributes.  For example, for proxied BAM
            files, unique IDs can be generated from the read ID and the
            position (in case the read is multiply mapped). e.g.: "{ID}-{Start}".
            This is necessary because on-the-fly generated JSON may include
            a given feature multiple times (if the feature spans multiple chunks)
        chunkBases (optional, either this or "nclist" is required): for on-the-fly generated JSON tracks,
            the number of bases per chunk
    },
    histograms (optional; if there's no hist the client could display a "no summary for this track" message at low zooms): {
        meta: array with one object for each histogram zoom level, e.g.:
               [{"arrayParams":{"length":12462,"chunkSize":10000,"urlTemplate":"hist-20000-{Chunk}.jsonz"},"basesPerBin":20000}, ...]
        stats: array with object for each histogram zoom level that gives mean, max, and bases per bin information, e.g.
               [{"max":1040,"bases":20000,"mean":7.7826191622532495}, ...]
    }
}

trackList:

{
    formatVersion: 1,
    defaults (optional, is this necessary?): {
        If we do have defaults, they need to be per-track-type
        (i.e., separate defaults for FeatureTrack and ImageTrack)
        or possibly even a more fine-grained breakdown
    },
    tracks: array of track descriptors, e.g.:
        [
            {
                label: <track identifier>
                key: <human-readable track label>
                meta (optional): object with attribute/value mappings for searching/describing tracks, e.g.
                                 {"source": "Hugh Jass lab", "developmental stage": "0-2 hours", "tissue": "whatever", "feature type": "EST"}
                type: should be the name of a JS class implementing the Track interface, e.g.: "FeatureTrack"
                config: potentially different for each Track class; for a FeatureTrack, could be:
                    {
                        urlTemplate (here or as a first-level attribute of the track descriptor?):
                            URL template for finding the trackData file; will be substituted with a "refseq" value, e.g.:
                            "tracks/foo/{refseq}/trackData.jsonz"
                        linkTemplate:
                            template for feature links out, e.g.:
                            "http://flybase.org/{ID}"
                        style: {"className": "feature2", "subfeatureClasses": {"CDS": "foo", "UTR": "bar"},
                            "arrowheadClass": "foo", "featureCss": "height: 8px;", "histCss": "background-color: blue;"},
                        scaleThresh: {"hist": 0.5, "label": 5, "subfeature": 8},
                        hooks: {
                            "create": "function(track, feat, attrs) { var elem = document.createElement(\"div\"); if (attrs.get(feat, \"Score\") > 50) { elem.style.backgroundColor = \"blue\"}; return elem; }",
                            "modify": "function(track, feat, attrs, elem) { if (attrs.get(feat, \"Score\") > 50) { elem.style.backgroundColor = \"blue\" };"
                        },
                        "events": {
                            "click": "function(track, elem, feat, attrs, event) { ... }",
                            "mouseover": "function(track, elem, feat, attrs, event) { ... }"
                        }
                    }
                sourceUrl: will be populated by the client code
            },
            ...
        ]
}
-- Main.MitchSkinner - 17 Oct 2010
Edit | Attach | Print version | History: r39 < r38 < r37 < r36 < r35 | Backlinks | Raw View | Raw edit | More topic actions


Parents: WebHome
This site is powered by the TWiki collaboration platformCopyright © 2008-2014 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
TWiki Appliance - Powered by TurnKey Linux