Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ETL-654] Clean up before integration test run #121

Merged
merged 13 commits into from
Jul 2, 2024
Merged
32 changes: 26 additions & 6 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ Recover ETL has four github workflows:
- workflows/codeql-analysis.yml
- workflows/cleanup.yaml

| Workflow name | Scenario it's run |
| :-------------------------------- |:---------------- |
| upload-and-deploy | on-push from feature branch, feature branch merged into main |
| upload-and-deploy-to-prod-main | whenever a new tag is created |
| codeql-analysis | on-push from feature branch, feature branch merged into main
| cleanup | feature branch deleted |
| Workflow name | Scenario it's run |
|:-------------------------------|:-------------------------------------------------------------|
| upload-and-deploy | on-push from feature branch, feature branch merged into main |
| upload-and-deploy-to-prod-main | whenever a new tag is created |
| codeql-analysis | on-push from feature branch, feature branch merged into main |
| cleanup | feature branch deleted |

## upload-and-deploy

Expand Down Expand Up @@ -45,6 +45,16 @@ With the current way when the `test_json_to_parquet.py` run, sometimes the glue

### sceptre-deploy-develop

### integration-test-develop-cleanup

This is responsible for cleaning up any data locations that are used by integration
tests. This is used after `sceptre-deploy-develop`, but before
`integration-test-develop`. Cleans these locations:

* `s3://recover-dev-input-data/$GITHUB_REF_NAME/`
* `s3://recover-dev-intermediate-data/$GITHUB_REF_NAME/json/`


### integration-test-develop

This builds the S3 to JSON lambda and triggers it with the pilot data so that the Recover ETL Glue Workflow will start running and processing the pilot data. **Note** that this will run with every push to the feature branch so it would be good to wait until one run of the Glue workflow finishes running as we cannot have more than 1 concurrent Glue Workflow run.
Expand All @@ -57,6 +67,16 @@ This integration test means that you have to wait until the glue workflow has fi

Here that we are **NOT** configuring a S3 event notification configuration for our `prod/staging` space because we plan to submit data to `staging` "manually" after merging a PR into main and triggering the GitHub workflow.

### integration-test-staging-cleanup

This is responsible for cleaning up any data locations that are used by integration
tests during the staging run. This is used after `sceptre-deploy-staging`, but before
`integration-test-staging`. Cleans these locations:

* `s3://recover-dev-input-data/staging/`
BryanFauble marked this conversation as resolved.
Show resolved Hide resolved
* `s3://recover-dev-intermediate-data/staging/json/`


## upload-and-deploy-to-prod-main

This runs **ONLY** when we create a new tag.
Expand Down
68 changes: 66 additions & 2 deletions .github/workflows/upload-and-deploy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,10 @@ env:
NAMESPACE: main
PYTHON_VERSION: 3.9
DEV_INPUT_BUCKET: recover-dev-input-data
DEV_INTERMEDIATE_BUCKET: recover-dev-intermediate-data
DEV_PROCESSED_BUCKET: recover-dev-processed-data
PROD_INPUT_BUCKET: recover-input-data
PROD_INTERMEDIATE_BUCKET: recover-intermediate-data
INTEGRATION_TEST_NUM_EXPORTS: 28

jobs:
Expand Down Expand Up @@ -255,11 +257,44 @@ jobs:
Payload: '{"RequestType": "Create"}'
LogType: Tail

integration-test-develop-cleanup:
name: Cleanup non-main branch data before integration tests
runs-on: ubuntu-latest
needs: sceptre-deploy-develop
if: github.ref_name != 'main'
environment: develop
# These permissions are needed to interact with GitHub's OIDC Token endpoint.
permissions:
id-token: write
contents: read
steps:
- name: Setup code, pipenv, aws
uses: Sage-Bionetworks/action-pipenv-aws-setup@v3
with:
role_to_assume: ${{ vars.AWS_CREDENTIALS_IAM_ROLE }}
role_session_name: GitHubActions-${{ github.repository_owner }}-${{ github.event.repository.name }}-${{ github.run_id }}
python_version: ${{ env.PYTHON_VERSION }}

- name: Set namespace for non-default branch
run: echo "NAMESPACE=$GITHUB_REF_NAME" >> $GITHUB_ENV

- name: Clean input data bucket
run: >
python src/scripts/manage_artifacts/clean_staging.py
--bucket $DEV_INPUT_BUCKET
--bucket_prefix "${{ env.NAMESPACE }}/"

- name: Clean intermediate data bucket
run: >
python src/scripts/manage_artifacts/clean_staging.py
--bucket $DEV_INTERMEDIATE_BUCKET
--bucket_prefix "${{ env.NAMESPACE }}/json/"


integration-test-develop:
name: Triggers ETL workflow with S3 test files
runs-on: ubuntu-latest
needs: sceptre-deploy-develop
needs: [sceptre-deploy-develop, integration-test-develop-cleanup]
environment: develop
# These permissions are needed to interact with GitHub's OIDC Token endpoint.
permissions:
Expand Down Expand Up @@ -335,11 +370,40 @@ jobs:
- name: Deploy sceptre stacks to staging on prod
run: pipenv run sceptre --var "namespace=staging" launch prod --yes

integration-test-staging-cleanup:
name: Cleanup main branch staging data before integration tests
runs-on: ubuntu-latest
needs: sceptre-deploy-staging
if: github.ref_name == 'main'
environment: prod
# These permissions are needed to interact with GitHub's OIDC Token endpoint.
permissions:
id-token: write
contents: read
steps:
- name: Setup code, pipenv, aws
uses: Sage-Bionetworks/action-pipenv-aws-setup@v3
with:
role_to_assume: ${{ vars.AWS_CREDENTIALS_IAM_ROLE }}
role_session_name: GitHubActions-${{ github.repository_owner }}-${{ github.event.repository.name }}-${{ github.run_id }}
python_version: ${{ env.PYTHON_VERSION }}

- name: Clean input data bucket
run: >
python src/scripts/manage_artifacts/clean_staging.py
--bucket $PROD_INPUT_BUCKET
--bucket_prefix "staging/"

- name: Clean intermediate data bucket
run: >
python src/scripts/manage_artifacts/clean_staging.py
--bucket $PROD_INTERMEDIATE_BUCKET
--bucket_prefix "staging/json/"

integration-test-staging:
name: Triggers staging workflow with production data
runs-on: ubuntu-latest
needs: sceptre-deploy-staging
needs: [sceptre-deploy-staging, integration-test-staging-cleanup]
environment: prod
# These permissions are needed to interact with GitHub's OIDC Token endpoint.
permissions:
Expand Down
1 change: 1 addition & 0 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ moto = "~=4.1"
datacompy = "~=0.8"
docker = "~=6.1"
ecs_logging = "~=2.0"
boto3 = "<2.0"
# flask libraries required for moto_server
flask = "~=2.0"
flask-cors = "~=3.0"
Loading
Loading