-
Notifications
You must be signed in to change notification settings - Fork 2
ICAv2
A few tools released in 1.3 for quickly deploying and launching a workflow to ICAv2 Some of these tools will be migrated to other packages in the future
- Some housekeeping
- Zipping up a workflow
- Deploying a workflow
- Launching a workflow
- Viewing our workflow
- Finding our data outputs
- Debugging our workflows
Firstly, unlike ICAv1 workflows, we're not keeping track of the workflows here in the same way.
Workflows are not 'synced', they're just deployed. If you deployed that same workflow five minutes ago, you now
have two copies, with the only difference being the timestamp in the name (code) of the workflow.
We still don't have an ability to delete workflows on ICAv2, so please don't go overboard.
The workflow id -> workflow version hierarchy has also been abolished. Workflows are deployed directly into a single project.
The workflow can be moved around with bundles, which is a discussion for another day.
For now, you can manually link a workflow from one project to another.
Launching workflow runs is quite complicated in ICAv2, which is why we've invented some quick fixes to help with the transition from ICAv1.
- Tasks are not isolated with TES, everything is run as if you're running it cwltool locally.
- This makes CWLTool expressions significantly faster
- All input data is first downloaded into a scratch space before tasks are run.
- You also need to consider how much space you need for your inputs and outputs (or use what the workflow provided as a default)
All the talk above about workflows and workflow runs... forget it!
This is ICAv2, we now have 'pipelines' and 'analyses'.
ALl references below to 'workflow' will be only in the context of a CWL Workflow file you have locally.
It's pipelines and analyses from here on in!
Run names (or analysis names I should say) are now reference to as the 'user reference'.
We still have run ids, they're now just analysis ids.
A CWL Workflow does not need to be packed into a massive json tarball before deployment. This does make readability much easier if a user wanted to debug a workflow via the UI. However, it also makes collecting all the relevant files a non-trivial task.
Fortunately, we have a tool for 'zipping' up a workflow ready for deployment into ICAv2 as a pipeline.
In this basic example, we will just use a workflow that outputs a indexed tabix file.
# Create an output directory to dump our zip file
mkdir output-zip-dir
# Zip the tabix workflow to output-zip-dir
# This will create a file tabix-workflow__0.2.6.zip inside the output-zip-dir directory
cwl-ica icav2-zip-workflow \
--workflow-path "${CWL_ICA_REPO_PATH}/workflows/tabix-workflow/0.2.6/tabix-workflow__0.2.6.cwl" \
--output-path output-zip-dir/
Let's view the contents of the zip file
(
cd output-zip-dir && \
unzip -l tabix-workflow__0.2.6.zip
)
Yields the following
Archive: tabix-workflow__0.2.6.zip
Length Date Time Name
--------- ---------- ----- ----
204 2022-10-22 14:39 tabix-workflow__0.2.6/params.xml
0 2022-10-22 14:39 tabix-workflow__0.2.6/tools/
1082 2022-10-22 14:39 tabix-workflow__0.2.6/workflow.cwl
0 2022-10-22 14:39 tabix-workflow__0.2.6/tools/tabix/
0 2022-10-22 14:39 tabix-workflow__0.2.6/tools/tabix/0.2.6/
1767 2022-10-22 14:39 tabix-workflow__0.2.6/tools/tabix/0.2.6/tabix__0.2.6.cwl
--------- -------
3053 6 files
Note that tabix-workflow/0.2.6/tabix-workflow__0.2.6.cwl
has been renamed workflow.cwl
.
We also have a params.xml
file to use this pipeline with the UI.
The params.xml file is currently non-functional. This will be updated once it is able to support schema based inputs.
For the sake of experimentation, let's compare workflow.cwl
to the original "${CWL_ICA_REPO_PATH}/workflows/tabix-workflow/0.2.6/tabix-workflow__0.2.6.cwl"
diff \
"${CWL_ICA_REPO_PATH}/workflows/tabix-workflow/0.2.6/tabix-workflow__0.2.6.cwl" \
<( \
cd "output-zip-dir" && \
unzip -qq -c tabix-workflow__0.2.6.zip tabix-workflow__0.2.6/workflow.cwl \
)
Yields
< run: ../../../tools/tabix/0.2.6/tabix__0.2.6.cwl
---
> run: tools/tabix/0.2.6/tabix__0.2.6.cwl
If we look at the structure of the zip file, we see why cwl-ica has manually updated the relative path of the tool.
workflow.cwl
is now at the top level of the directory, not nested under workflows/tabix-workflow/0.2.6,
so the ../../../
is no longer necessary.
Working from the previous step, we can use the cwl-ica icav2-deploy-pipeline
subcommand to take a zip file
and deploy it to a project.
This subcommand takes in the path to the zip file along with the following options:
- a project to deploy the pipeline to (required)
- a default storage size for analyses using the project, one of "Small", "Medium" or "Large" (Optional, defaults to Small)
That's it!
cwl-ica icav2-deploy-pipeline \
--zipped-workflow-path output-zip-dir/tabix-workflow__0.2.6.zip \
--project-name playground_v2 \
--analysis-storage-size Small
This returns us two parameters.
- A pipeline 'code', which is essentially the name of the pipeline.
- This is generated from the name of the CWL workflow
- Using the current timestamp and md5sum of the zip file as additional extensions since no pipeline code can be the same
- A pipeline 'id', which is a UUID string for the pipeline.
Keep both the code and id handy as we will need at least one of them for the next step.
Launching a workflow through ICAv2 is particularly complicated
- A list of file ids / folder ids for a user to mount (and then relative mount paths for these files)
- An input json where location attributes of files and directories match the relative mount paths of the files / directories above
- The id of an existing output folder to determine where to place the output files
- A project id (this one seems reasonable)
- An activation id (which requires a user to ping the /api/activationCodes:findBestMatchForCwl endpoint, um what?)
The ICAv2 CLI does handle some of these things, but the mount path functionality doesn't exist, and you need to figure out all of your data ids from your file and directory paths.
This seems substandard, so we came up with a new way.
This takes all the goodies (well some anyway) from ICAv1, and allows us to launch analyses similar to the way we once did!
The cwl-ica icav2-launch-pipeline-analysis
takes in the following subcommand
- An input-json (very similar to v1 and discussed below)
- A pipeline-code / pipeline id (remember from the section above?)
- A project-name / project id (where to you want to run this pipeline?)
- An analysis storage size (optional, will take the pipeline default value otherwise)
- An output folder path / id (if it doesn't exist, we'll create it for you)
- An activation id (definitely optional)
Which looks similar to the command below
cwl-ica icav2-launch-pipeline-analysis \
--launch-json launch.json \
--pipeline-code tabix-workflow__0_2_6--20221022144201--c4362c6840d00788d7fc14af955d3601 \
--project-name playground_v2
Simple! This will output the user reference and analysis id
The launch.json requires three main inputs
- A 'user_reference' (the 'name' of the analysis)
- An 'input' section, very similar to what we had with ICAv1 input jsons for WES
- An 'engine_parameters' section, also inspired by ICAv1.
The input section should act like an input.json used for a local CWL analysis.
For location attributes for file and directories, use the icav2://
uri schema to reference icav2 locations.
Where the netloc is either the project id or project name where the data resides and the path is the absolute path to the data.
One can also use the query parameter presign=true
to turn the location from a relative mount path location to a presigned url.
This will remove the file from the list of data ids to download.
For example, the input for tabix workflow might look something like this:
{
"input": {
"vcf_file": {
"class": "File",
"location": "icav2://playground_v2/test_tabix/input_vcf/trio.2010_06.ychr.sites.vcf.gz"
}
}
}
We can specify the following attributes in the engine parameters
- Path to the output folder (output_parent_folder_path / output_parent_folder_id)
- Analysis tags
- Can be one or more of (technical_tags, user_tags or reference_tags)
- The storage size of the analysis (analysis_storage_id / analysis_storage_size)
- The activation id (again, just let cwl-ica set that for you)
- cwltool_overrides (Override compute or container resources for a given step)
- This can be also set in the input setting under the key cwltool:overrides
- Overrides in the input take precedence over those in the engine parameters
Many of the command line options, overlap with the engine parameters, when both are set, the command line options take precedence.
The key for a given step is different to what we may be used to in ICAv1 with packed CWL workflows.
Here, to specify an override,
- take the path to the workflow (or subworkflow) that calls a given step.
- Add a '#'
- Add the id of the workflow (or the subworkflow) found under the id key of the file.
- Add a '/'
- Add the name of the step.
For example, to override the container of the tabix step, our engine parameters may look something like this
{
"engine_parameters": {
"output_parent_folder_path": "/test_tabix/",
"cwltool_overrides": {
"workflow.cwl#tabix-workflow--0.2.6/run_tabix_step": {
"requirements": {
"DockerRequirement": {
"dockerPull": "public.ecr.aws/biocontainers/tabix:0.2.6--ha92aebf_0"
}
}
}
}
}
}
For more complex beasts, such as the bclconvert-with-qc-pipeline, the overrides key for the bclconvert run step
would be workflows/bclconvert/4.0.3/bclconvert__4.0.3.cwl#bclconvert--4.0.3/bcl_convert_run_step
.
However as noted in this github issue there appears to be a discrepancy between the current cwltool version used on ICAv2 (3.0.20201203173111) and the current cwltool version (3.1.20221018083734).
For 3.0.20201203173111, if the workflow that invokes the step is a subworkflow, then drop the ID and just place the name of the step after the #.
So workflows/bclconvert/4.0.3/bclconvert__4.0.3.cwl#bclconvert--4.0.3/bcl_convert_run_step
becomes workflows/bclconvert/4.0.3/bclconvert__4.0.3.cwl#bcl_convert_run_step
.
Here is the launch json used to launch the tabix pipeline analyses
{
"user_reference": "tabix_workflow_test_overrides",
"input": {
"vcf_file": {
"class": "File",
"location": "icav2://playground_v2/test_tabix/input_vcf/trio.2010_06.ychr.sites.vcf.gz"
}
},
"engine_parameters": {
"output_parent_folder_path": "/test_tabix/",
"cwltool_overrides": {
"workflow.cwl#tabix-workflow--0.2.6/run_tabix_step": {
"requirements": {
"DockerRequirement": {
"dockerPull": "public.ecr.aws/biocontainers/tabix:0.2.6--ha92aebf_0"
}
}
}
}
}
}
We can then view our workflow from the UI.
The ID and user reference can be seen in the Pipeline pane
In the top right corner we can see our input json in the Input Json pane as the following
{
"vcf_file": {
"class": "File",
"location": "9c20aeb0-f780-4e9a-ac42-739b30ed91f3/fil.1c2677181d0c4f474f6808d9fddc8a4f/L2000969.hard-filtered.vcf.gz"
},
"cwltool:overrides": {
"workflow.cwl#tabix-workflow--0.2.6/run_tabix_step": {
"requirements": {
"DockerRequirement": {
"dockerPull": "public.ecr.aws/biocontainers/tabix:0.2.6--ha92aebf_0"
}
}
}
}
}
We can see at the bottom our input file L2000969.hard-filtered.vcf.gz
was mounted at 9c20aeb0-f780-4e9a-ac42-739b30ed91f3/fil.1c2677181d0c4f474f6808d9fddc8a4f/L2000969.hard-filtered.vcf.gz
.
We can also see the outputs under the Output Json and Output Files pane in the bottom right.
Outputs can be found under <output_parent_folder_path>/<pipeline-code>-<analysis-id>
We can also use the cwl-ica to debug our workflows, which saves us having to navigate the UI.
For example
cwl-ica icav2-list-analysis-steps \
--project-name playground_v2 \
--analysis-id 4caa991e-9d9a-44ee-8a8c-fe5f0e012d7d
Gives us the list of steps for an analysis
Click to expand!
[
{
"name": "get-file-from-directory--1.0.1",
"status": "DONE",
"queue_date": "2022-10-24T06:00:49Z",
"start_date": "2022-10-24T06:00:50Z",
"end_date": "2022-10-24T06:00:51Z"
},
{
"name": "create_dummy_directory_step",
"status": "DONE",
"queue_date": "2022-10-24T06:00:55Z",
"start_date": "2022-10-24T06:01:03Z",
"end_date": "2022-10-24T06:01:03Z"
},
{
"name": "create_dummy_file_step",
"status": "DONE",
"queue_date": "2022-10-24T06:00:55Z",
"start_date": "2022-10-24T06:01:04Z",
"end_date": "2022-10-24T06:01:04Z"
},
{
"name": "parse-samplesheet--2.0.0--4.0.3",
"status": "DONE",
"queue_date": "2022-10-24T06:01:05Z",
"start_date": "2022-10-24T06:01:06Z",
"end_date": "2022-10-24T06:01:06Z"
},
{
"name": "parse-bclconvert-run-configuration-object--2.0.0--4.0.3",
"status": "DONE",
"queue_date": "2022-10-24T06:01:07Z",
"start_date": "2022-10-24T06:01:07Z",
"end_date": "2022-10-24T06:01:07Z"
},
{
"name": "bcl_convert_samplesheet_check_step",
"status": "DONE",
"queue_date": "2022-10-24T06:01:10Z",
"start_date": "2022-10-24T06:06:41Z",
"end_date": "2022-10-24T06:06:42Z"
},
{
"name": "interop_qc_step",
"status": "DONE",
"queue_date": "2022-10-24T06:01:16Z",
"start_date": "2022-10-24T06:07:04Z",
"end_date": "2022-10-24T06:09:26Z"
},
{
"name": "bcl_convert_run_step",
"status": "FAILED",
"queue_date": "2022-10-24T06:07:15Z",
"start_date": "2022-10-24T06:09:34Z",
"end_date": "2022-10-24T06:17:58Z"
}
]
We can see that this workflow failed at the bclconvert_run_step.
Let's see if we can get some logs from this, we can run the icav2-get-analysis-step-logs subcommand to view the stderr of this step
cwl-ica icav2-get-analysis-step-logs \
--project-name playground_v2 \
--analysis-id 4caa991e-9d9a-44ee-8a8c-fe5f0e012d7d \
--step-name bcl_convert_run_step \
--stderr
Which yields
Click to expand!
Timeout waiting on reply from server http://169.254.169.254/latest/dynamic/instance-identity/document
LICENSE_MSG| Challenge get token error: Get instance ID failed (Unable to retrieve AWS identity document)
Timeout waiting on reply from server http://169.254.169.254/latest/dynamic/instance-identity/document
LICENSE_MSG| Challenge get token error: Get instance ID failed (Unable to retrieve AWS identity document)
Timeout waiting on reply from server http://169.254.169.254/latest/dynamic/instance-identity/document
LICENSE_MSG| Challenge get token error: Get instance ID failed (Unable to retrieve AWS identity document)
LICENSE_MSG| Challenge timeout after 180 seconds
Assertion failed in /data/jenkins/workspace/dragen_release_4.0/src/host/dragen_api/license/license_manager.cpp line 5078 -- false -- Challenge expired, HW is now locked
Dumping diagnostics....
DRAGEN replay file saved to 170612_A00181_0011_AH2JK7DMXX_converted/dragen_replay_1666591982482_17.json
DRAGEN registers saved to 170612_A00181_0011_AH2JK7DMXX_converted/dragen_info_1666591982482_17.log
Hang diagnostic saved to 170612_A00181_0011_AH2JK7DMXX_converted/hang_diag_1666591982482_17.txt
pstack saved to 170612_A00181_0011_AH2JK7DMXX_converted/pstack_1666591983076_17.log
Fatal exception: Assertion failed in /data/jenkins/workspace/dragen_release_4.0/src/host/dragen_api/license/license_manager.cpp line 5078 -- false -- Challenge expired, HW is now locked
Timeout waiting on reply from server http://169.254.169.254/latest/dynamic/instance-identity/document
LICENSE_MSG| Close session get token error: Get instance ID failed (Unable to retrieve AWS identity document)
Timeout waiting on reply from server http://169.254.169.254/latest/dynamic/instance-identity/document
LICENSE_MSG| Close session get token error: Get instance ID failed (Unable to retrieve AWS identity document)
Timeout waiting on reply from server http://169.254.169.254/latest/dynamic/instance-identity/document
LICENSE_MSG| Close session get token error: Get instance ID failed (Unable to retrieve AWS identity document)
Fatal error: Assertion failed in /data/jenkins/workspace/dragen_release_4.0/src/host/dragen_api/license/license_manager.cpp line 5078 -- false -- Challenge expired, HW is now locked
***************************************************************************************
Please run sosreport to collect diagnostic and configuration information:
sudo sosreport --batch
This requires root privileges and may take several minutes to execute. When completed,
sosreport generates a compressed file in /tmp or /var/tmp. The location of this file
is given in the script output. For example:
Your sosreport has been generated and saved in:
/tmp/sosreport-hostname.companyname.com-20160526151939.tar.xz
Please send this report to your Illumina support representative.
***************************************************************************************
Aborting the application - it may take several minutes to dump the core file
FATAL: Caught signal Aborted (6)
Resetting the DRAGEN Bio-IT processor and stopping all software threads
run_bclconvert.sh: line 34: 17 Aborted (core dumped) /opt/edico/bin/dragen -v --logging-to-output-dir=true --bcl-conversion-only true "${@}"
We can also collect the entire cwltool stderr outputs with the following command
cwl-ica icav2-get-analysis-step-logs \
--project-name playground_v2 \
--analysis-id 4caa991e-9d9a-44ee-8a8c-fe5f0e012d7d \
--step-name "cwltool" \
--stderr > cwtool.stderr.txt