Skip to content
Alexis Lucattini edited this page Nov 6, 2022 · 5 revisions

ICAv2

A few tools released in 1.3 for quickly deploying and launching a workflow to ICAv2 Some of these tools will be migrated to other packages in the future

Some housekeeping

Tracking v2 workflow

Firstly, unlike ICAv1 workflows, we're not keeping track of the workflows here in the same way.
Workflows are not 'synced', they're just deployed. If you deployed that same workflow five minutes ago, you now have two copies, with the only difference being the timestamp in the name (code) of the workflow.
We still don't have an ability to delete workflows on ICAv2, so please don't go overboard.

Workflow deployment

The workflow id -> workflow version hierarchy has also been abolished. Workflows are deployed directly into a single project.
The workflow can be moved around with bundles, which is a discussion for another day.
For now, you can manually link a workflow from one project to another.

Launching workflow runs

Launching workflow runs is quite complicated in ICAv2, which is why we've invented some quick fixes to help with the transition from ICAv1.

  • Tasks are not isolated with TES, everything is run as if you're running it cwltool locally.
    • This makes CWLTool expressions significantly faster
  • All input data is first downloaded into a scratch space before tasks are run.
  • You also need to consider how much space you need for your inputs and outputs (or use what the workflow provided as a default)

Nomenclature

All the talk above about workflows and workflow runs... forget it!
This is ICAv2, we now have 'pipelines' and 'analyses'.
ALl references below to 'workflow' will be only in the context of a CWL Workflow file you have locally.
It's pipelines and analyses from here on in!

Run names (or analysis names I should say) are now reference to as the 'user reference'.
We still have run ids, they're now just analysis ids.

Zipping up a workflow

A CWL Workflow does not need to be packed into a massive json tarball before deployment. This does make readability much easier if a user wanted to debug a workflow via the UI. However, it also makes collecting all the relevant files a non-trivial task.

Fortunately, we have a tool for 'zipping' up a workflow ready for deployment into ICAv2 as a pipeline.

In this basic example, we will just use a workflow that outputs a indexed tabix file.

# Create an output directory to dump our zip file
mkdir output-zip-dir

# Zip the tabix workflow to output-zip-dir
# This will create a file tabix-workflow__0.2.6.zip inside the output-zip-dir directory
cwl-ica icav2-zip-workflow \
  --workflow-path "${CWL_ICA_REPO_PATH}/workflows/tabix-workflow/0.2.6/tabix-workflow__0.2.6.cwl" \
  --output-path output-zip-dir/

Let's view the contents of the zip file


(
  cd output-zip-dir && \
  unzip -l tabix-workflow__0.2.6.zip
)

Yields the following

Archive:  tabix-workflow__0.2.6.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
      204  2022-10-22 14:39   tabix-workflow__0.2.6/params.xml
        0  2022-10-22 14:39   tabix-workflow__0.2.6/tools/
     1082  2022-10-22 14:39   tabix-workflow__0.2.6/workflow.cwl
        0  2022-10-22 14:39   tabix-workflow__0.2.6/tools/tabix/
        0  2022-10-22 14:39   tabix-workflow__0.2.6/tools/tabix/0.2.6/
     1767  2022-10-22 14:39   tabix-workflow__0.2.6/tools/tabix/0.2.6/tabix__0.2.6.cwl
---------                     -------
     3053                     6 files

Note that tabix-workflow/0.2.6/tabix-workflow__0.2.6.cwl has been renamed workflow.cwl. We also have a params.xml file to use this pipeline with the UI.

The params.xml file is currently non-functional. This will be updated once it is able to support schema based inputs.

For the sake of experimentation, let's compare workflow.cwl to the original "${CWL_ICA_REPO_PATH}/workflows/tabix-workflow/0.2.6/tabix-workflow__0.2.6.cwl"

diff \
  "${CWL_ICA_REPO_PATH}/workflows/tabix-workflow/0.2.6/tabix-workflow__0.2.6.cwl" \
  <( \
    cd "output-zip-dir" && \
    unzip -qq -c tabix-workflow__0.2.6.zip tabix-workflow__0.2.6/workflow.cwl \
  )

Yields

<     run: ../../../tools/tabix/0.2.6/tabix__0.2.6.cwl
---
>     run: tools/tabix/0.2.6/tabix__0.2.6.cwl

If we look at the structure of the zip file, we see why cwl-ica has manually updated the relative path of the tool.
workflow.cwl is now at the top level of the directory, not nested under workflows/tabix-workflow/0.2.6, so the ../../../ is no longer necessary.

Deploying a workflow

Working from the previous step, we can use the cwl-ica icav2-deploy-pipeline subcommand to take a zip file and deploy it to a project.

This subcommand takes in the path to the zip file along with the following options:

  • a project to deploy the pipeline to (required)
  • a default storage size for analyses using the project, one of "Small", "Medium" or "Large" (Optional, defaults to Small)

That's it!

cwl-ica icav2-deploy-pipeline \
  --zipped-workflow-path output-zip-dir/tabix-workflow__0.2.6.zip \
  --project-name playground_v2 \
  --analysis-storage-size Small

This returns us two parameters.

  1. A pipeline 'code', which is essentially the name of the pipeline.
    • This is generated from the name of the CWL workflow
    • Using the current timestamp and md5sum of the zip file as additional extensions since no pipeline code can be the same
  2. A pipeline 'id', which is a UUID string for the pipeline.

Keep both the code and id handy as we will need at least one of them for the next step.

Launching a workflow

Once upon a time

Launching a workflow through ICAv2 is particularly complicated

  • A list of file ids / folder ids for a user to mount (and then relative mount paths for these files)
  • An input json where location attributes of files and directories match the relative mount paths of the files / directories above
  • The id of an existing output folder to determine where to place the output files
  • A project id (this one seems reasonable)
  • An activation id (which requires a user to ping the /api/activationCodes:findBestMatchForCwl endpoint, um what?)

The ICAv2 CLI does handle some of these things, but the mount path functionality doesn't exist, and you need to figure out all of your data ids from your file and directory paths.

This seems substandard, so we came up with a new way.

Introducing cwl-ica icav2-launch-pipeline-analysis

This takes all the goodies (well some anyway) from ICAv1, and allows us to launch analyses similar to the way we once did!

The cwl-ica icav2-launch-pipeline-analysis takes in the following subcommand

  • An input-json (very similar to v1 and discussed below)
  • A pipeline-code / pipeline id (remember from the section above?)
  • A project-name / project id (where to you want to run this pipeline?)
  • An analysis storage size (optional, will take the pipeline default value otherwise)
  • An output folder path / id (if it doesn't exist, we'll create it for you)
  • An activation id (definitely optional)

Which looks similar to the command below

 cwl-ica icav2-launch-pipeline-analysis \
  --launch-json launch.json \
  --pipeline-code tabix-workflow__0_2_6--20221022144201--c4362c6840d00788d7fc14af955d3601 \
  --project-name playground_v2

Simple! This will output the user reference and analysis id

What's in the launch.json?

The launch.json requires three main inputs

  • A 'user_reference' (the 'name' of the analysis)
  • An 'input' section, very similar to what we had with ICAv1 input jsons for WES
  • An 'engine_parameters' section, also inspired by ICAv1.

Specfifying inputs

The input section should act like an input.json used for a local CWL analysis.
For location attributes for file and directories, use the icav2:// uri schema to reference icav2 locations.
Where the netloc is either the project id or project name where the data resides and the path is the absolute path to the data.

One can also use the query parameter presign=true to turn the location from a relative mount path location to a presigned url.
This will remove the file from the list of data ids to download.

For example, the input for tabix workflow might look something like this:

{
  "input": {
    "vcf_file": {
      "class": "File",
      "location": "icav2://playground_v2/test_tabix/input_vcf/trio.2010_06.ychr.sites.vcf.gz"
    }
  }
}

Specifying engine parameters

We can specify the following attributes in the engine parameters

  • Path to the output folder (output_parent_folder_path / output_parent_folder_id)
  • Analysis tags
    • Can be one or more of (technical_tags, user_tags or reference_tags)
  • The storage size of the analysis (analysis_storage_id / analysis_storage_size)
  • The activation id (again, just let cwl-ica set that for you)
  • cwltool_overrides (Override compute or container resources for a given step)
    • This can be also set in the input setting under the key cwltool:overrides
    • Overrides in the input take precedence over those in the engine parameters

Many of the command line options, overlap with the engine parameters, when both are set, the command line options take precedence.

A (not very) short note on cwl overrides

The key for a given step is different to what we may be used to in ICAv1 with packed CWL workflows.

Here, to specify an override,

  • take the path to the workflow (or subworkflow) that calls a given step.
  • Add a '#'
  • Add the id of the workflow (or the subworkflow) found under the id key of the file.
  • Add a '/'
  • Add the name of the step.

For example, to override the container of the tabix step, our engine parameters may look something like this

{
  "engine_parameters": {
    "output_parent_folder_path": "/test_tabix/",
    "cwltool_overrides": {
      "workflow.cwl#tabix-workflow--0.2.6/run_tabix_step": {
        "requirements": {
          "DockerRequirement": {
            "dockerPull": "public.ecr.aws/biocontainers/tabix:0.2.6--ha92aebf_0"
          }
        }
      }
    }
  }
}

For more complex beasts, such as the bclconvert-with-qc-pipeline, the overrides key for the bclconvert run step would be workflows/bclconvert/4.0.3/bclconvert__4.0.3.cwl#bclconvert--4.0.3/bcl_convert_run_step.

However as noted in this github issue there appears to be a discrepancy between the current cwltool version used on ICAv2 (3.0.20201203173111) and the current cwltool version (3.1.20221018083734).

For 3.0.20201203173111, if the workflow that invokes the step is a subworkflow, then drop the ID and just place the name of the step after the #.

So workflows/bclconvert/4.0.3/bclconvert__4.0.3.cwl#bclconvert--4.0.3/bcl_convert_run_step becomes workflows/bclconvert/4.0.3/bclconvert__4.0.3.cwl#bcl_convert_run_step.

Putting it all together

Here is the launch json used to launch the tabix pipeline analyses

{
  "user_reference": "tabix_workflow_test_overrides",
  "input": {
    "vcf_file": {
      "class": "File",
      "location": "icav2://playground_v2/test_tabix/input_vcf/trio.2010_06.ychr.sites.vcf.gz"
    }
  },
  "engine_parameters": {
    "output_parent_folder_path": "/test_tabix/",
    "cwltool_overrides": {
      "workflow.cwl#tabix-workflow--0.2.6/run_tabix_step": {
        "requirements": {
          "DockerRequirement": {
            "dockerPull": "public.ecr.aws/biocontainers/tabix:0.2.6--ha92aebf_0"
          }
        }
      }
    }
  }
}

Viewing our workflow

We can then view our workflow from the UI.

The ID and user reference can be seen in the Pipeline pane

In the top right corner we can see our input json in the Input Json pane as the following

{
  "vcf_file": {
    "class": "File",
    "location": "9c20aeb0-f780-4e9a-ac42-739b30ed91f3/fil.1c2677181d0c4f474f6808d9fddc8a4f/L2000969.hard-filtered.vcf.gz"
  },
  "cwltool:overrides": {
    "workflow.cwl#tabix-workflow--0.2.6/run_tabix_step": {
      "requirements": {
        "DockerRequirement": {
          "dockerPull": "public.ecr.aws/biocontainers/tabix:0.2.6--ha92aebf_0"
        }
      }
    }
  }
}

We can see at the bottom our input file L2000969.hard-filtered.vcf.gz was mounted at 9c20aeb0-f780-4e9a-ac42-739b30ed91f3/fil.1c2677181d0c4f474f6808d9fddc8a4f/L2000969.hard-filtered.vcf.gz.

We can also see the outputs under the Output Json and Output Files pane in the bottom right.

Finding our data outputs

Outputs can be found under <output_parent_folder_path>/<pipeline-code>-<analysis-id>

Debugging our workflows

We can also use the cwl-ica to debug our workflows, which saves us having to navigate the UI.

For example

cwl-ica icav2-list-analysis-steps \
  --project-name playground_v2 \
  --analysis-id 4caa991e-9d9a-44ee-8a8c-fe5f0e012d7d

Gives us the list of steps for an analysis

Click to expand!
[
  {
    "name": "get-file-from-directory--1.0.1",
    "status": "DONE",
    "queue_date": "2022-10-24T06:00:49Z",
    "start_date": "2022-10-24T06:00:50Z",
    "end_date": "2022-10-24T06:00:51Z"
  },
  {
    "name": "create_dummy_directory_step",
    "status": "DONE",
    "queue_date": "2022-10-24T06:00:55Z",
    "start_date": "2022-10-24T06:01:03Z",
    "end_date": "2022-10-24T06:01:03Z"
  },
  {
    "name": "create_dummy_file_step",
    "status": "DONE",
    "queue_date": "2022-10-24T06:00:55Z",
    "start_date": "2022-10-24T06:01:04Z",
    "end_date": "2022-10-24T06:01:04Z"
  },
  {
    "name": "parse-samplesheet--2.0.0--4.0.3",
    "status": "DONE",
    "queue_date": "2022-10-24T06:01:05Z",
    "start_date": "2022-10-24T06:01:06Z",
    "end_date": "2022-10-24T06:01:06Z"
  },
  {
    "name": "parse-bclconvert-run-configuration-object--2.0.0--4.0.3",
    "status": "DONE",
    "queue_date": "2022-10-24T06:01:07Z",
    "start_date": "2022-10-24T06:01:07Z",
    "end_date": "2022-10-24T06:01:07Z"
  },
  {
    "name": "bcl_convert_samplesheet_check_step",
    "status": "DONE",
    "queue_date": "2022-10-24T06:01:10Z",
    "start_date": "2022-10-24T06:06:41Z",
    "end_date": "2022-10-24T06:06:42Z"
  },
  {
    "name": "interop_qc_step",
    "status": "DONE",
    "queue_date": "2022-10-24T06:01:16Z",
    "start_date": "2022-10-24T06:07:04Z",
    "end_date": "2022-10-24T06:09:26Z"
  },
  {
    "name": "bcl_convert_run_step",
    "status": "FAILED",
    "queue_date": "2022-10-24T06:07:15Z",
    "start_date": "2022-10-24T06:09:34Z",
    "end_date": "2022-10-24T06:17:58Z"
  }
]

We can see that this workflow failed at the bclconvert_run_step.

Let's see if we can get some logs from this, we can run the icav2-get-analysis-step-logs subcommand to view the stderr of this step

cwl-ica icav2-get-analysis-step-logs \
  --project-name playground_v2 \
  --analysis-id 4caa991e-9d9a-44ee-8a8c-fe5f0e012d7d \
  --step-name bcl_convert_run_step \
  --stderr

Which yields

Click to expand!
Timeout waiting on reply from server http://169.254.169.254/latest/dynamic/instance-identity/document
LICENSE_MSG| Challenge get token error: Get instance ID failed (Unable to retrieve AWS identity document)
Timeout waiting on reply from server http://169.254.169.254/latest/dynamic/instance-identity/document
LICENSE_MSG| Challenge get token error: Get instance ID failed (Unable to retrieve AWS identity document)
Timeout waiting on reply from server http://169.254.169.254/latest/dynamic/instance-identity/document
LICENSE_MSG| Challenge get token error: Get instance ID failed (Unable to retrieve AWS identity document)
LICENSE_MSG| Challenge timeout after 180 seconds
Assertion failed in /data/jenkins/workspace/dragen_release_4.0/src/host/dragen_api/license/license_manager.cpp line 5078 -- false -- Challenge expired, HW is now locked
Dumping diagnostics....
DRAGEN replay file saved to 170612_A00181_0011_AH2JK7DMXX_converted/dragen_replay_1666591982482_17.json
DRAGEN registers saved to 170612_A00181_0011_AH2JK7DMXX_converted/dragen_info_1666591982482_17.log
Hang diagnostic saved to 170612_A00181_0011_AH2JK7DMXX_converted/hang_diag_1666591982482_17.txt
pstack saved to 170612_A00181_0011_AH2JK7DMXX_converted/pstack_1666591983076_17.log

Fatal exception: Assertion failed in /data/jenkins/workspace/dragen_release_4.0/src/host/dragen_api/license/license_manager.cpp line 5078 -- false -- Challenge expired, HW is now locked
Timeout waiting on reply from server http://169.254.169.254/latest/dynamic/instance-identity/document
LICENSE_MSG| Close session get token error: Get instance ID failed (Unable to retrieve AWS identity document)
Timeout waiting on reply from server http://169.254.169.254/latest/dynamic/instance-identity/document
LICENSE_MSG| Close session get token error: Get instance ID failed (Unable to retrieve AWS identity document)
Timeout waiting on reply from server http://169.254.169.254/latest/dynamic/instance-identity/document
LICENSE_MSG| Close session get token error: Get instance ID failed (Unable to retrieve AWS identity document)

Fatal error: Assertion failed in /data/jenkins/workspace/dragen_release_4.0/src/host/dragen_api/license/license_manager.cpp line 5078 -- false -- Challenge expired, HW is now locked

***************************************************************************************
Please run sosreport to collect diagnostic and configuration information:

   sudo sosreport --batch

This requires root privileges and may take several minutes to execute.  When completed,
sosreport generates a compressed file in /tmp or /var/tmp.  The location of this file
is given in the script output.  For example:

  Your sosreport has been generated and saved in:
    /tmp/sosreport-hostname.companyname.com-20160526151939.tar.xz

Please send this report to your Illumina support representative.
***************************************************************************************

Aborting the application - it may take several minutes to dump the core file
FATAL: Caught signal Aborted (6)

Resetting the DRAGEN Bio-IT processor and stopping all software threads
run_bclconvert.sh: line 34:    17 Aborted                 (core dumped) /opt/edico/bin/dragen -v --logging-to-output-dir=true --bcl-conversion-only true "${@}"

We can also collect the entire cwltool stderr outputs with the following command

cwl-ica icav2-get-analysis-step-logs \
  --project-name playground_v2 \
  --analysis-id 4caa991e-9d9a-44ee-8a8c-fe5f0e012d7d \
  --step-name "cwltool" \
  --stderr > cwtool.stderr.txt