-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #143 from umccr/feature/icav2-copy-batch-utility
Added feature icav2 copy batch state machine
- Loading branch information
Showing
23 changed files
with
2,165 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
import { | ||
icav2AccessTokenSecretNameDev, | ||
icav2AccessTokenSecretNameStg, | ||
icav2AccessTokenSecretNameProd, | ||
AccountName, | ||
icav2CopyBatchSSMRoot, | ||
} from '../constants'; | ||
|
||
import { ICAv2CopyBatchUtilityConfig } from '../../lib/workload/stateless/stacks/icav2-copy-batch-utility/deploy/stack'; | ||
import path from 'path'; | ||
|
||
export const getICAv2CopyBatchUtilityStackProps = (n: AccountName): ICAv2CopyBatchUtilityConfig => { | ||
const baseConfig = { | ||
icav2_copy_batch_state_machine_name: 'icav2_copy_batch_utility_sfn', | ||
icav2_copy_batch_state_machine_arn_ssm_parameter_path: path.join( | ||
icav2CopyBatchSSMRoot, | ||
'batch_sfn_arn' | ||
), | ||
icav2_copy_batch_state_machine_name_ssm_parameter_path: path.join( | ||
icav2CopyBatchSSMRoot, | ||
'batch_sfn_name' | ||
), | ||
icav2_copy_single_state_machine_name: 'icav2_single_batch_utility_sfn', | ||
icav2_copy_single_state_machine_arn_ssm_parameter_path: path.join( | ||
icav2CopyBatchSSMRoot, | ||
'single_sfn_arn' | ||
), | ||
icav2_copy_single_state_machine_name_ssm_parameter_path: path.join( | ||
icav2CopyBatchSSMRoot, | ||
'single_sfn_name' | ||
), | ||
}; | ||
|
||
switch (n) { | ||
case 'beta': | ||
return { | ||
...baseConfig, | ||
icav2_token_secret_id: icav2AccessTokenSecretNameDev, | ||
}; | ||
case 'gamma': | ||
return { | ||
...baseConfig, | ||
icav2_token_secret_id: icav2AccessTokenSecretNameStg, | ||
}; | ||
case 'prod': | ||
return { | ||
...baseConfig, | ||
icav2_token_secret_id: icav2AccessTokenSecretNameProd, | ||
}; | ||
} | ||
}; |
8 changes: 8 additions & 0 deletions
8
lib/workload/stateless/stacks/icav2-copy-batch-utility/.gitignore
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
*.js | ||
!jest.config.js | ||
*.d.ts | ||
node_modules | ||
|
||
# CDK asset staging directory | ||
.cdk.staging | ||
cdk.out |
6 changes: 6 additions & 0 deletions
6
lib/workload/stateless/stacks/icav2-copy-batch-utility/.npmignore
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
*.ts | ||
!*.d.ts | ||
|
||
# CDK asset staging directory | ||
.cdk.staging | ||
cdk.out |
158 changes: 158 additions & 0 deletions
158
lib/workload/stateless/stacks/icav2-copy-batch-utility/Readme.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
# ICAv2 Copy Batch Utility | ||
|
||
<!-- TOC --> | ||
* [ICAv2 Copy Batch Utility](#icav2-copy-batch-utility) | ||
* [Overview](#overview) | ||
* [Inputs](#inputs) | ||
* [Outputs](#outputs) | ||
* [Lambdas in this directory](#lambdas-in-this-directory) | ||
* [Flip Manifest](#flip-manifest) | ||
* [Launch Copy Job](#launch-copy-job) | ||
* [Update job session](#update-job-session) | ||
* [SSM Parameters](#ssm-parameters-) | ||
* [External Parameters required by CDK](#external-parameters-required-by-cdk) | ||
<!-- TOC --> | ||
|
||
## Overview | ||
|
||
The icav2 copy batch utility CDK wraps an AWS Step Function over the ICAv2 CopyBatch API. | ||
This api is designed to copy a list of illumina file ids into a directory. | ||
|
||
We exploit this API by taking in a manifest (a list of key values of source ids with their respective destinations) ( | ||
a source may have multiple destinations if it needs to be copied into a few different places | ||
), and then monitor a set of API jobs to completion. | ||
|
||
These CopyBatch API jobs have about a 20% fail rate, so we sometimes need to resubmit, this is built into the step function. | ||
|
||
A 20% fail rate seems quite high, but when they work, these jobs can transfer 80 Gb of data in under 10 seconds so it's worth | ||
persisting with. | ||
|
||
When working with the ICAv2 CopyBatch API, we need to provide a unique identifier for the run, and a unique location for the outputs. | ||
|
||
Once all jobs are deployed, each job is monitored by its own separate state machine | ||
|
||
This process will transfer an entire BCLConvert process from one ICAv2 project to another in under 10 minutes! | ||
|
||
Note that a current limitation prevents using this API within the same project. | ||
|
||
|
||
### Step Functions Graphs | ||
|
||
The copy batch utility has the following steps | ||
|
||
 | ||
|
||
The step function map will then spawn out a job for each folder that is monitored by the state machine below. | ||
|
||
Failed jobs are relaunched by the single state machine below and monitored until they pass. | ||
We accept 10 failed jobs before we give up. | ||
|
||
 | ||
|
||
## Inputs | ||
|
||
* Statemachine expects the following inputs: | ||
* One of `manifest` or `manifest_b64gz`. | ||
|
||
|
||
An example is below | ||
|
||
```json | ||
{ | ||
"manifest": { | ||
"icav2://b23fb516-d852-4985-adcc-831c12e8cd22/ilmn-analyses/231116_A01052_0172_BHVLM5DSX7_d24651_4c90dc-BclConvert v4_2_7-b719c8d9-5e6d-49e6-a8be-ca17b5e9d40b/output/Samples/Lane_1/L2301368/L2301368_S1_L001_R1_001.fastq.gz": [ | ||
"icav2://7595e8f2-32d3-4c76-a324-c6a85dae87b5/ilmn_cttso_fastq_cache/20240308abcd1234/L2301368_run_cache/L2301368/" | ||
], | ||
"icav2://b23fb516-d852-4985-adcc-831c12e8cd22/ilmn-analyses/231116_A01052_0172_BHVLM5DSX7_d24651_4c90dc-BclConvert v4_2_7-b719c8d9-5e6d-49e6-a8be-ca17b5e9d40b/output/Samples/Lane_1/L2301368/L2301368_S1_L001_R2_001.fastq.gz": [ | ||
"icav2://7595e8f2-32d3-4c76-a324-c6a85dae87b5/ilmn_cttso_fastq_cache/20240308abcd1234/L2301368_run_cache/L2301368/" | ||
] | ||
} | ||
} | ||
``` | ||
|
||
OR | ||
|
||
```json5 | ||
{ | ||
"manifest_b64gz": "H4sIAAAAAAAAA+2SzUoDMRhF932KoWvT/P9111ZFod0oqCASMklaBqYzY5IOFvHdHRXBlYtutND9x/3uOdzXUVGMK2d7MoWwJHRdciyAV5wAphUH1jsHFMUOk6CcJwRW9bYBtrH1PoUECcUYCzNDGHFiEJbEzK/ulit+fvsgjSdMcGyY08g7MHf1om36EHPRM0OMBKXE2imvAQ/CDw+DAFaVATiLZcmD9gyVsN3lbpfhst0keGlTfl60264OOUzySx5Pi8cB4QeE5JoHtSaAEk8Bc3LIpIQBJ6zi3gY1JH9CmC5WWxv3kCBCfweBVAgsuP44ZYgiaUvnpdLoq9V4aPB0dlwm721sqmaTJnW7OUk8UOJFjG08KTxQ4U3o2pgTnHnb5RDNKuRYuTRxqf9rnd/Vjszo5wqum3X7vyY5ehu9Ax+WAaFoBgAA" /* pragma: allowlist-secret */ | ||
} | ||
``` | ||
|
||
## Outputs | ||
|
||
```json | ||
{ | ||
"job_status_iterable_parameter": { | ||
"job_status_iterable": 1 | ||
}, | ||
"wait_parameter": { | ||
"wait": true | ||
}, | ||
"counters": { | ||
"jobs_failed": 0, | ||
"jobs_running": 0, | ||
"jobs_passed": 1 | ||
}, | ||
"job_list_with_attempt_counter": [ | ||
{ | ||
"job_attempt_counter": 1, | ||
"job_id": "6f3d6981-0dff-4413-8388-2bb445d03dd7", | ||
"failed_jobs_list": [], | ||
"dest_uri": "icav2://7595e8f2-32d3-4c76-a324-c6a85dae87b5/ilmn_cttso_fastq_cache/20240308abcd1234/L2301368_run_cache/L2301368/", | ||
"source_uris": [ | ||
"icav2://b23fb516-d852-4985-adcc-831c12e8cd22/ilmn-analyses/231116_A01052_0172_BHVLM5DSX7_d24651_4c90dc-BclConvert v4_2_7-b719c8d9-5e6d-49e6-a8be-ca17b5e9d40b/output/Samples/Lane_1/L2301368/L2301368_S1_L001_R1_001.fastq.gz", | ||
"icav2://b23fb516-d852-4985-adcc-831c12e8cd22/ilmn-analyses/231116_A01052_0172_BHVLM5DSX7_d24651_4c90dc-BclConvert v4_2_7-b719c8d9-5e6d-49e6-a8be-ca17b5e9d40b/output/Samples/Lane_1/L2301368/L2301368_S1_L001_R2_001.fastq.gz" | ||
], | ||
"job_status": true | ||
} | ||
] | ||
} | ||
``` | ||
|
||
## Lambdas in this directory | ||
|
||
All lambdas run on python 3.11 or higher. | ||
|
||
### Flip Manifest | ||
|
||
This lambda takes in a manifest and flips the keys and values. | ||
The dictionary becomes a list of objects where the keys are values under `dest_uri` and values are under | ||
the list `source_uris` | ||
|
||
In the example above, because both files are heading to the same directory we get the following output | ||
|
||
```json | ||
[ | ||
{ | ||
"dest_uri": "icav2://7595e8f2-32d3-4c76-a324-c6a85dae87b5/ilmn_cttso_fastq_cache/20240308abcd1234/L2301368_run_cache/L2301368/", | ||
"source_uris": [ | ||
"icav2://b23fb516-d852-4985-adcc-831c12e8cd22/ilmn-analyses/231116_A01052_0172_BHVLM5DSX7_d24651_4c90dc-BclConvert v4_2_7-b719c8d9-5e6d-49e6-a8be-ca17b5e9d40b/output/Samples/Lane_1/L2301368/L2301368_S1_L001_R1_001.fastq.gz", | ||
"icav2://b23fb516-d852-4985-adcc-831c12e8cd22/ilmn-analyses/231116_A01052_0172_BHVLM5DSX7_d24651_4c90dc-BclConvert v4_2_7-b719c8d9-5e6d-49e6-a8be-ca17b5e9d40b/output/Samples/Lane_1/L2301368/L2301368_S1_L001_R2_001.fastq.gz" | ||
] | ||
} | ||
] | ||
``` | ||
|
||
### Launch Copy Job | ||
|
||
Simple function that takes in a dest uri and a list of source uris, converts both into a folder id and file ids respectively | ||
and launches the ICAv2 Copy Data Batch Job. | ||
|
||
This returns a job id, we tie the job id with the dest uri and source uris (because we cannot collect these from the job themselves), | ||
and monitor the job throughout the step function process. | ||
|
||
This lambda is called in a map state. | ||
|
||
|
||
## SSM Parameters | ||
|
||
``` | ||
"/icav2_copy_batch_utility/state_machine_arn_batch" | ||
"/icav2_copy_batch_utility/state_machine_arn_single" | ||
``` | ||
|
||
### External Parameters required by CDK | ||
|
||
``` | ||
"/icav2/umccr-prod/service-user-trial-jwt-token-secret-arn" | ||
``` |
Oops, something went wrong.