Use existing build output #94

chrisammon3000 · 2023-09-04T18:27:42Z

Description

Fixed errors preventing some releases from building (340, 3130)
Upgrade the state machine pipeline with an option to use existing build artifacts from previous executions if available (use_existing_build=true)
Refactored the validation step to run inside the build stage on single releases instead of running after on multiple releases
Added option to skip the load process, in case only the build artifacts are needed (skip_load=true)
Error handling during build is improved
- Exit code of 1 indicates critical failure and causes build to fail
- Exit code of 2 indicates non-critical failure, for example when some alleles fail to build. Build can still succeed.
Errors during build are output to <data_bucket>/data/<release>/errors/errors.ndjson for later analysis
- Failed Alleles queue is removed for now as it doesn't support debugging as well as the error output

Usage

use_existing_build=true will look for existing CSV files and load these. If there are no CSVs for the release then they will be created.

skip_load=true will run only the build stage and will skip loading. This is useful when just the CSVs are needed.

# Example for single version
STAGE=<stage> make database.load.run releases="3510"

# Example for multiple versions where only 3510 has already been built
# 3490 and 3500 will be built, 3510 will use existing CSVs
STAGE=<stage> make database.load.run \
    releases="3490,3500,3510" \
    use_existing_build=true

# Example of how to build all releases and skip loading
STAGE=dev make database.load.run releases=300,310,320,330,340,350,360,370,380,390,3100,3110,3120,3130,3140,3150,3160,3170,3180,3190,3200,3210,3220,3230,3240,3250,3260,3270,3280,3290,3300,3310,3320,3330,3340,3350,3360,3370,3380,3390,3400,3410,3420,3430,3440,3450,3460,3470,3480,3490,3500,3510,3520,3530 skip_load=true

Next Steps

Merge with git commit state tracking branch (Issues for 1.0 #89)

…tics/gfe-db into nmdp/incremental_load

…tion-error

…al_load/fix-validation-error

pbashyal-nmdp

👍

chrisammon3000 added 24 commits August 22, 2023 13:39

specify stage for make

80ffa36

correct default pipeline input

bb9cdb2

Merge branch 'incremental_load' of https://github.com/nmdp-bioinforma…

3f6bc76

…tics/gfe-db into nmdp/incremental_load

remove limit param validation for now

d215a98

increase timeout and memory on validation Lambda

c2050b8

Merge branch 'nmdp/incremental_load' into incremental_load/fix-valida…

2ec9433

…tion-error

set release pattern regex

caf9e5c

fix premature exit on build errors causing container to stop

8acd785

capture error output to disk as ndjson

b336aec

consolidate error handling within process_allele function; remove sqs

474dd4c

remove failed alleles queue

379b719

script proceeds if some alleles fail

ff80986

validate build for single release version

5d3c76c

fix restore target

8e7f051

fix input paths in state machine

6749b3a

use existing build output if available

e869acb

wait for backup document to finish

29033d8

Merge branch 'fix-validation-error/use-existing-build' into increment…

67e012a

…al_load/fix-validation-error

refactor input array for cleaner output

e8b4970

add use_existing_build to pipeline params

ab83c3a

evaluate use_existing_build after file validation

6b0dc56

refactor environment validation target

26b6654

add skip_load to pipeline params

45140a5

update README

2d08920

chrisammon3000 requested a review from pbashyal-nmdp September 4, 2023 18:27

chrisammon3000 linked an issue Sep 4, 2023 that may be closed by this pull request

Use previous data for build pipeline if it already exists #92

Open

chrisammon3000 self-assigned this Sep 4, 2023

pbashyal-nmdp approved these changes Sep 7, 2023

View reviewed changes

pbashyal-nmdp merged commit 1344602 into nmdp-bioinformatics:incremental_load Sep 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use existing build output #94

Use existing build output #94

chrisammon3000 commented Sep 4, 2023 •

edited

Loading

pbashyal-nmdp left a comment

Use existing build output #94

Use existing build output #94

Conversation

chrisammon3000 commented Sep 4, 2023 • edited Loading

Description

Usage

Next Steps

pbashyal-nmdp left a comment

Choose a reason for hiding this comment

chrisammon3000 commented Sep 4, 2023 •

edited

Loading