Skip to content

Contributions

Alexis Lucattini edited this page Jun 30, 2021 · 2 revisions

Contribution Guide

This guide is to set a publishable standard of current and future tools and workflows in this repository.
It's aims are to improve readability, reproducibility and overall sharing and collaboration of tools in this repository.

We have split this document roughly by the components in CWL definition.

Yaml standards

Key vs list format?

Key format

inputs:
  my_input:
    label: "An input I have"
    doc: |
      This is the long hand form description
      of this input
    inputBinding:
      position: 1

VS

List format

inputs:
  - id: my_input
    label: "An input I have"
    doc: |
      This is the long hand form description
      of this input
    inputBinding:
      position: 1

Personally I prefer key format. Indentation improves readability

Hints/Requirements

ResourceRequirements

  • More readable in key format
  • Should be specified in hints instead of requirements.
  • ilmn-tes requirements to also be set in hints.
  • ICA TES then takes ilmn-tes. Standard requirements taken elsewhere.
  • Requirements can then be overwritten at launch time.

ShellCommandRequirement

  • This should be avoided.
  • Use an inline-entry script if possible instead.

Tool Specific Standards

  • Overall tool definition should have a doc, label and author
    • label to be a single line string. It may contain spaces.
    • doc to be in multi-line string format
    • Author should contain full-name, email address and github username. See this doc store example for more information

Inputs

  • Should be set in key-format with the id set as the input key.

    • id should be lower case and match the kwarg parameter prefix if present.
      • Hyphens in the prefix should be replaced with underscores for the id.
      • --output-prefix becomes output_prefix
  • Each input should contain a doc and label.

    • label to be a single line string. It may contain spaces.
    • doc to be in multi-line string format
  • Each output should contain a doc and label.

    • label to be a single line string. It may contain spaces.
    • doc to be in multi-line string format
  • Preference using the inputBinding attributes rather than being handled in an "inline entry script".

  • inputBinding prefixes should be in long-hand format.

    • If a parameter allows for either -r or --reference. --reference should be selected instead.
  • Position parameters in inputBinding may be negative, as this means that they are specified after any values in arguments.

  • Position parameters that must be specified at the end of the command should start at 100 and count backwards.

    • i.e a parameter that must precede the last parameter but must come after all others should then be set at 99
  • Defaults specified in the doc component and only placed in ‘default’ is a necessary argument for functioning of the program

    • Otherwise we make an assumption that specifying an argument as its default has the same behaviour as leaving it as null
  • A list of successCodes should be set for each tool.

  • Any inline entry shell scripts should be run in strict mode

    set -euo pipefail

    -e causes the shell to exit immediately if any command as non-zero exit status.

    -u raises an error if you call a variable you haven't defined - picks up typos.\

    -o pipefail, if set the return value of a pipeline is the value of the last command to exit with a non-zero status.

    -x may be used for debugging a workflow. Check out more set built-in options here

  • Format attribute should NOT be used as it lacks portability between workflows and requires specification on input.

    • However open to suggestions and a potential consideration for the future.

Workflow specific standards

  • Overall workflow definition should have a doc, label and author
    • label to be a single line string. It may contain spaces.
    • doc to be in multi-line string format
    • Author should contain full-name, email address and github username. See this doc store example for more information

Workflow inputs:

  • Should be set in key-format with the id set as the input key.

    • id should be lower case, match the id of its use case in the workflow with the addition of the suffix of the step where it is used.
    • For example, let's say I want to have a parameter for the --output-prefix parameter of a tool in the workflow.
      I have referenced the tool in the workflow under the step dragen_step I should then set the id of this input as output_prefix_dragen
  • Each input should contain a doc and label.

    • label to be a single line string. It may contain spaces.
    • doc to be in multi-line string format
  • Each output should contain a doc and label.

    • label to be a single line string. It may contain spaces.
    • doc to be in multi-line string format
  • Defaults specified in the doc component and only placed in ‘default’ is a necessary argument for functioning of the program

    • Otherwise we make an assumption that specifying an argument as its default has the same behaviour as leaving it as null
  • Input and output names must node match, this does not bode well when using packed jsons.

Workflow steps

  • Use _step suffix for the id of a step in the workflow.

    • This is preferably in key format
    • Step also has a label and a doc.
  • The run should link to a separate file rather than the tool being defined 'in-place'.

    • This improves modularisation.
    • A bug will cause 'in-place' tools from being incorrectly packed.