click on the Biowiki logo to go to homepage Home Home | EditEdit | Attach Attach | New New | Site Map Site Map | Help Help
Research Teaching
Main | JBrowse | TWiki
Biowiki > JBrowse > JsonFormatStabilization


Advanced search...



Fix current sub-optimal aspects of the JSON format, allowing us to:
  1. support it into the future, and
  2. start encouraging other people to generate it

Current Issues:

only 2 levels of nesting possible

There are currently separate header arrays for the first level features and their immediate subfeatures. This doesn't (currently) allow for specifying the headers for other levels. Many possible ways of addressing this:
  1. make all levels the same
    1. cost: extra empty fields in feature arrays - currently, top level usually doesn't have a feature type field, but subfeatures usually do; also, if top-level feature is rich, subfeatures can be much leaner
  2. make all subfeatures have the same structure, even if there are multiple levels of nesting. Keep feature structure different between top-level features and subfeatures.
    1. cost: potential empty fields in feature arrays, if different levels of subfeatures have different structure (how likely?)
  3. add a "class" tag to each feature: have each feature specify which set of fields it has, and separately have an array of "classes" that specify those sets of fields.
    1. cost: if <= 10 different classes (should almost always be the case), two bytes per feature for a single-digit number and a comma
    2. benefits:
      1. allows us to specify a "lazy feature" class with just 3 fields ["start", "end", "chunk"] and get rid of the {"chunk": "300"} business currently in the JSON for lazy features, which should slim down the trackData.json a fair amount (how much do we win after compression?).
      2. allows us to remove some of the "lazyIndex" special-case code in JsonGenerator and NCList.js.
      3. more generally, makes the JSON significantly more flexible

likely-to-change client config stuff is mixed in with the less-likely-to-change feature JSON

Should move client config stuff up into a higher-level file, either trackInfo.js or a new, separate trackConfigs.js.

required/optional fields in features aren't documented

JBrowse needs a numeric "start" and "end"; JBrowse will do useful things with "strand" and "name" fields if they're in the JSON, and the default feature click callback uses the "id" field if it's there. We should make a list of required/optional fields, and make a rule for reserving field names for JBrowse use (like the GFF3 rule that uppercase names are reserved; we should probably just adopt the same rule). I'd want to reserve "Sublist" and "Chunk" or "LazyChunk" or something for NCList operation; maybe "UniqueID" as well for something that guaranteed to be unique in some context (e.g., the chado guarantee that a uniquename is unique across a type/organism).

trackData for FeatureTracks:

    formatVersion: 1,
    featureCount: <# of features>,
    intervals: {
        classes: array of "class" descriptors, e.g. 
                     "name" (should be optional IMO): unique name for this "class", intended for human consumption
                     "attributes" (required): e.g.: ["Start", "End", "Strand", "Subfeatures"],
                         like GFF3, we can reserve all upper-case attributes.  Attributes the client knows about:
                         "Start", "End", "Strand", "Name", "ID"?, "Subfeatures", "Sublist", "Type", "Chunk", "Phase", "Score"?
                         (should we have a controlled vocabulary for these?)
                     "proto" (optional): attribute-value mappings that are the same for all features tagged with this class, e.g.:
                         {"Chrom": "chr1", "Type": "match_pair"},
                     "isArrayAttr": an object with an {<attr name>: true} mapping
                         for each attribute that is meant to be an array, e.g.: {"Subfeatures": true}
                         This is needed because features are represented with arrays, so generic
                         client code needs a way to differentiate arrays that are meant to be features
                         from arrays that are meant to be actual arrays.
                         (optional; if omitted, none of this class' attributes will be treated as arrays)
                     "attributes": ["Start", "End"],
                     "proto": {"Chrom": "chr1", "Type": "match"},
                     "attributes": ["Start", "End", "Chunk"],
                     "proto": {"Chrom": "chr1"},
                     "isLazy": true
        lazyClass: index of the class used to tag "lazy" features
        nclist (optional, either this or "chunkBases" is required): nclist of feature arrays;
            sublists are held in a "Sublist" attribute, e.g.:
            [[0, 10000, 20000, -1, [[1, 10000, 12000, "UTR"], [1, 13000, 15000, "CDS]]], {"Sublist": [0, 17000, 18000, 1, []]}]
            The zeroth position in each feature array indicates the class for that feature
        urlTemplate: for "fake" features, the JSON file containing the subtree covered by that feature
            should be at this URL, once the "fake" feature attributes have been substituted in, e.g.:
            "lazyfeatures-{Chunk}.jsonz", or "/foo?chr={Chrom}&start={Start}&end={End}"
        uniqueIdTemplate (optional): for pre-generated JSON, this will be omitted, because the path
            through the NCList is used as the unique ID.  But for on-the-fly
            generated JSON, this will be a template for specifying a unique ID
            in terms of the feature attributes.  For example, for proxied BAM
            files, unique IDs can be generated from the read ID and the
            position (in case the read is multiply mapped). e.g.: "{ID}-{Start}".
            This is necessary because on-the-fly generated JSON may include
            a given feature multiple times (if the feature spans multiple chunks)
        chunkBases (optional, either this or "nclist" is required): for on-the-fly generated JSON tracks,
            the number of bases per chunk
    histograms (optional; if there's no hist the client could display a "no summary for this track" message at low zooms): {
        meta: array with one object for each histogram zoom level, e.g.:
               [{"arrayParams":{"length":12462,"chunkSize":10000,"urlTemplate":"hist-20000-{Chunk}.jsonz"},"basesPerBin":20000}, ...]
        stats: array with object for each histogram zoom level that gives mean, max, and bases per bin information, e.g.
               [{"max":1040,"bases":20000,"mean":7.7826191622532495}, ...]


    formatVersion: 1,
    defaults (optional, is this necessary?): {
        If we do have defaults, they need to be per-track-type
        (i.e., separate defaults for FeatureTrack and ImageTrack)
        or possibly even a more fine-grained breakdown
    tracks: array of track descriptors, e.g.:
                label: <track identifier>
                key: <human-readable track label>
                meta (optional): object with attribute/value mappings for searching/describing tracks, e.g.
                                 {"source": "Hugh Jass lab", "developmental stage": "0-2 hours", "tissue": "whatever", "feature type": "EST"}
                type: should be the name of a JS class implementing the Track interface, e.g.: "FeatureTrack"
                config: potentially different for each Track class; for a FeatureTrack, could be:
                        urlTemplate (here or as a first-level attribute of the track descriptor?):
                            URL template for finding the trackData file; will be substituted with a "refseq" value, e.g.:
                            template for feature links out, e.g.:
                        style: {"className": "feature2", "subfeatureClasses": {"CDS": "foo", "UTR": "bar"},
                            "arrowheadClass": "foo", "featureCss": "height: 8px;", "histCss": "background-color: blue;"},
                        scaleThresh: {"hist": 0.5, "label": 5, "subfeature": 8},
                        hooks: {
                            "create": "function(track, feat, attrs) { var elem = document.createElement(\"div\"); if (attrs.get(feat, \"Score\") > 50) { = \"blue\"}; return elem; }",
                            "modify": "function(track, feat, attrs, elem) { if (attrs.get(feat, \"Score\") > 50) { = \"blue\" };"
                        "events": {
                            "click": "function(track, elem, feat, attrs, event) { ... }",
                            "mouseover": "function(track, elem, feat, attrs, event) { ... }"
                sourceUrl: will be populated by the client code
-- Main.MitchSkinner - 17 Oct 2010
Actions: Edit | Attach | New | Ref-By | Printable view | Raw view | Normal view | See diffs | Help | More...