Skip to content

Commit

Permalink
[ETL-654] Clean up before integration test run (#121)
Browse files Browse the repository at this point in the history
Clean up before integration test run
  • Loading branch information
BryanFauble authored Jul 2, 2024
1 parent 22391b0 commit 2c3439f
Show file tree
Hide file tree
Showing 6 changed files with 810 additions and 527 deletions.
32 changes: 26 additions & 6 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ Recover ETL has four github workflows:
- workflows/codeql-analysis.yml
- workflows/cleanup.yaml

| Workflow name | Scenario it's run |
| :-------------------------------- |:---------------- |
| upload-and-deploy | on-push from feature branch, feature branch merged into main |
| upload-and-deploy-to-prod-main | whenever a new tag is created |
| codeql-analysis | on-push from feature branch, feature branch merged into main
| cleanup | feature branch deleted |
| Workflow name | Scenario it's run |
|:-------------------------------|:-------------------------------------------------------------|
| upload-and-deploy | on-push from feature branch, feature branch merged into main |
| upload-and-deploy-to-prod-main | whenever a new tag is created |
| codeql-analysis | on-push from feature branch, feature branch merged into main |
| cleanup | feature branch deleted |

## upload-and-deploy

Expand Down Expand Up @@ -45,6 +45,16 @@ With the current way when the `test_json_to_parquet.py` run, sometimes the glue

### sceptre-deploy-develop

### integration-test-develop-cleanup

This is responsible for cleaning up any data locations that are used by integration
tests. This is used after `sceptre-deploy-develop`, but before
`integration-test-develop`. Cleans these locations:

* `s3://recover-dev-input-data/$GITHUB_REF_NAME/`
* `s3://recover-dev-intermediate-data/$GITHUB_REF_NAME/json/`


### integration-test-develop

This builds the S3 to JSON lambda and triggers it with the pilot data so that the Recover ETL Glue Workflow will start running and processing the pilot data. **Note** that this will run with every push to the feature branch so it would be good to wait until one run of the Glue workflow finishes running as we cannot have more than 1 concurrent Glue Workflow run.
Expand All @@ -57,6 +67,16 @@ This integration test means that you have to wait until the glue workflow has fi

Here that we are **NOT** configuring a S3 event notification configuration for our `prod/staging` space because we plan to submit data to `staging` "manually" after merging a PR into main and triggering the GitHub workflow.

### integration-test-staging-cleanup

This is responsible for cleaning up any data locations that are used by integration
tests during the staging run. This is used after `sceptre-deploy-staging`, but before
`integration-test-staging`. Cleans these locations:

* `s3://recover-input-data/staging/`
* `s3://recover-intermediate-data/staging/json/`


## upload-and-deploy-to-prod-main

This runs **ONLY** when we create a new tag.
Expand Down
68 changes: 66 additions & 2 deletions .github/workflows/upload-and-deploy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,10 @@ env:
NAMESPACE: main
PYTHON_VERSION: 3.9
DEV_INPUT_BUCKET: recover-dev-input-data
DEV_INTERMEDIATE_BUCKET: recover-dev-intermediate-data
DEV_PROCESSED_BUCKET: recover-dev-processed-data
PROD_INPUT_BUCKET: recover-input-data
PROD_INTERMEDIATE_BUCKET: recover-intermediate-data
INTEGRATION_TEST_NUM_EXPORTS: 28

jobs:
Expand Down Expand Up @@ -255,11 +257,44 @@ jobs:
Payload: '{"RequestType": "Create"}'
LogType: Tail

integration-test-develop-cleanup:
name: Cleanup non-main branch data before integration tests
runs-on: ubuntu-latest
needs: sceptre-deploy-develop
if: github.ref_name != 'main'
environment: develop
# These permissions are needed to interact with GitHub's OIDC Token endpoint.
permissions:
id-token: write
contents: read
steps:
- name: Setup code, pipenv, aws
uses: Sage-Bionetworks/action-pipenv-aws-setup@v3
with:
role_to_assume: ${{ vars.AWS_CREDENTIALS_IAM_ROLE }}
role_session_name: GitHubActions-${{ github.repository_owner }}-${{ github.event.repository.name }}-${{ github.run_id }}
python_version: ${{ env.PYTHON_VERSION }}

- name: Set namespace for non-default branch
run: echo "NAMESPACE=$GITHUB_REF_NAME" >> $GITHUB_ENV

- name: Clean input data bucket
run: >
pipenv run python src/scripts/manage_artifacts/clean_for_integration_test.py
--bucket $DEV_INPUT_BUCKET
--bucket_prefix "${{ env.NAMESPACE }}/"
- name: Clean intermediate data bucket
run: >
pipenv run python src/scripts/manage_artifacts/clean_for_integration_test.py
--bucket $DEV_INTERMEDIATE_BUCKET
--bucket_prefix "${{ env.NAMESPACE }}/json/"
integration-test-develop:
name: Triggers ETL workflow with S3 test files
runs-on: ubuntu-latest
needs: sceptre-deploy-develop
needs: [sceptre-deploy-develop, integration-test-develop-cleanup]
environment: develop
# These permissions are needed to interact with GitHub's OIDC Token endpoint.
permissions:
Expand Down Expand Up @@ -335,11 +370,40 @@ jobs:
- name: Deploy sceptre stacks to staging on prod
run: pipenv run sceptre --var "namespace=staging" launch prod --yes

integration-test-staging-cleanup:
name: Cleanup main branch staging data before integration tests
runs-on: ubuntu-latest
needs: sceptre-deploy-staging
if: github.ref_name == 'main'
environment: prod
# These permissions are needed to interact with GitHub's OIDC Token endpoint.
permissions:
id-token: write
contents: read
steps:
- name: Setup code, pipenv, aws
uses: Sage-Bionetworks/action-pipenv-aws-setup@v3
with:
role_to_assume: ${{ vars.AWS_CREDENTIALS_IAM_ROLE }}
role_session_name: GitHubActions-${{ github.repository_owner }}-${{ github.event.repository.name }}-${{ github.run_id }}
python_version: ${{ env.PYTHON_VERSION }}

- name: Clean input data bucket
run: >
pipenv run python src/scripts/manage_artifacts/clean_for_integration_test.py
--bucket $PROD_INPUT_BUCKET
--bucket_prefix "staging/"
- name: Clean intermediate data bucket
run: >
pipenv run python src/scripts/manage_artifacts/clean_for_integration_test.py
--bucket $PROD_INTERMEDIATE_BUCKET
--bucket_prefix "staging/json/"
integration-test-staging:
name: Triggers staging workflow with production data
runs-on: ubuntu-latest
needs: sceptre-deploy-staging
needs: [sceptre-deploy-staging, integration-test-staging-cleanup]
environment: prod
# These permissions are needed to interact with GitHub's OIDC Token endpoint.
permissions:
Expand Down
1 change: 1 addition & 0 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ moto = "~=4.1"
datacompy = "~=0.8"
docker = "~=6.1"
ecs_logging = "~=2.0"
boto3 = "<2.0"
# flask libraries required for moto_server
flask = "~=2.0"
flask-cors = "~=3.0"
Loading

0 comments on commit 2c3439f

Please sign in to comment.