Skip to content

Tree status in synthesis

Jim Allman edited this page Sep 30, 2016 · 11 revisions

In the curation application, we want to make it easier for curators to:

  • add trees to synthesis
  • see which trees are in synthesis / queued for synthesis
  • explicitly prevent trees from going into synthesis.

Making these changes may also affect the NexSON structure and / or the synthesis pipeline.

Current status

(As of September 2016)

Preferred trees

In the UI, users can set the status of any individual tree as "Preferred". This setting adds the treeId to the list of trees in the study-level ot:candidateTreeForSynthesis property, but this flag is not used during synthesis.

Do-not-include studies

Until recently, there was a 'This study should not contribute to synthesis' checkbox on the Metadata tab, which set the study-level ot:notIntendedForSynthesis property. The only effect of this property was to reduce the stringency of study quality validation in the curator app. Since we were not using the property, we recently removed the option from the UI.

Synthesis collections

To include a tree into synthesis, a user must add the tree to one of the opentreeoflife collections (or one of the user-owned collections in synthesis). Adding a tree to a collection does not change the study NexSON. When running propinquity, we specify a list of collections, and all trees in those collections are included in synthesis.

Issues with current system

Note that a first pass at an update may not address all of these issues.

  • putting a tree into synthesis is too hard, i.e. adding a tree to a synthesis collection is non-intuitive and has too many steps
  • the 'Preferred' tree checkbox and the 'This study should not contribute to synthesis' study checkbox in the UI suggest an effect on the synthesis pipeline, but no effect exists
  • we store properties in the NexSON that suggest an effect on synthesis, but synthesis decisions are entirely dependent on which trees are in collections
  • there is no way for a user to explicitly say "do not include this tree / study in synthesis"
  • we do not validate the curation status of trees when a user chooses an action that is (or appears to be) linked to synthesis, either 1. checking the preferred status; or 2. adding a tree to one of the synthesis collections
  • we are not clear about which collections go into synthesis, or how to add a non-opentreeoflife collection to synthesis

Proposed new workflow(s)

(For some additional background, you may want to look at this opentree issue about synthesis status, which references the study homepage mockup, this issue about nexson properties, and this diagram about curator / collection interactions.)

For each tree in a study, a curator can select one of four statuses, which trigger the following listed actions:

  • Include
    • check curation status (or alternately, grey this option out if insufficiently curated)
    • if passes validation, add to end of default synthesis collection
    • provide some help text to the user about how to up-rank a tree (or, alternately, provide a list of synthesis collections to choose from)
    • change the tree property in the NexSON to ot:candidateForSynthesis : ot:include
  • Do not include
    • change the tree property in the NexSON to ot:candidateForSynthesis : ot:doNotInclude
    • check if the tree exists in synthesis. If so, remove from collection(s)
  • Needs curation
    • change the tree property in the NexSON to ot:candidateForSynthesis : ot:needsCuration
  • Not reviewed (default)
    • change the tree property in the NexSON to ot:candidateForSynthesis : ot:notReviewed

Concerns with new workflow

  • the connection between the listed status of a tree, the presence of a tree in a collection, and the use of a tree for synthesis might be confusing

_[jimA: Even here, we talk about trees queued for synthesis, but this queue concept is nowhere in the UI or help text. I think this simplification is worth the occasional mystery for users, but it's worth discussing.]_ * information about trees proposed for synthesis is stored twice: once in the NexSON with `ot:candidateForSynthesis : ot:include` and a second time by the presence of the tree in a synthesis collection
_[jimA: While we'd like these to stay in lock-step, any manual review or failed validation is likely to cause them to diverge (and contribute to the mysteries described above).]_ * there could be drift over time between the NexSON property and the collection status * the curator needs some way of knowing what collections are being used for synthesis in order to remove trees flagged as `ot:doNotInclude` (or, potentially, to provide users a list of collections when changing status to `ot:include`)
_[jimA: This is an interesting idea! I've assumed we'd add/remove trees using a separate web service, called each time a study is committed to GitHub. But the latter scenario (offering a list of target collections) sounds really cool. This would be easy if we support "collections of collections" (a standing request); we'd simply show the top-level synthesis collection and let them browse child collections until they find the best spot. They could even rank it in the list, but this starts to seem onerous again...]_ * what is the effect of the `ot:needsCuration` option?
_[jimA: In terms of tree status in collections, this behaves exactly like `ot:doNotInclude`. But it also serves as a hint to other curators and (potentially) other programs. Same with `ot:notReviewed`, actually.]_

Alternate approaches

(As of 30 Sept, 2016, there is another suggestion, similar to 1, suggested at https://docs.google.com/document/d/12K7uoQglJMAsUHhYndDuqyU54YNC2lGqKrN6oTv-gmg/edit)

  1. We could remove all synthesis status info from the NexSONs and rely entirely on the collections.
  • PRO: no redundancy or chance of drift
  • PRO: no modifications to current collections-based synthesis procedure
  • CON: can't capture ot:doNotInclude vs ot:notReviewed; only boolean In / Not In collection
  • We could update only the NexSON from the curator; then, at some point later in the pipeline, update the collections based on changes in the ot:candidateForSynthesis property (removing trees without ot:include and adding trees that have ot:include).
    • PRO: curator logic simpler when user changes status
    • CON: unclear when and where to put this step; collections UI? propinquity?
    • CON: chance of drift if collection modified without updating nexsons

Note that the curator needs to know something about synthesis collections in either scenario, either to display the correct status in Option 1 or to know whether to add / remove trees to / from collections in Option 2.

Clone this wiki locally