Skip to content

Commit

Permalink
[ETL-648] Modify comparsion job to only consider records from recent …
Browse files Browse the repository at this point in the history
…exports (#119)

* initial commit of using integration test exports

* add linting and function docstring and hints

* add support for pyarrow filesystem testing, adjust for PR review comments, add support for deleted data types and subtypes

* add updated pipfile to include flask lib

* move moto_server dep to compare parquet tests

* add checks to helper func, remove unused imports

* fix cf line

* add missing source bucket param

* add missing input args, fix tests to test for expected calls

* add hive partitions to read in cohort as col and correct filters param

* add isort to precommit

* code review suggestion

* remove support for comparing deleted data types, comment out isort for now

* code review changes - refactor get_parquet_dataset, fix comments

* code review revisions, separate out exports filter function, reduce complexity in main

* add additional comments

* correct tests

* add check for parquet files in s3 location, remove redundant code

* fix parquet_datasets check, add more tests

* refactor is_valid_dataset to raise exceptions, separate out s3 bucket and client fixtures
  • Loading branch information
rxu17 authored Jun 25, 2024
1 parent cadb0b4 commit 1deebfc
Show file tree
Hide file tree
Showing 9 changed files with 1,394 additions and 547 deletions.
8 changes: 8 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,11 @@ repos:
rev: v1.4.1
hooks:
- id: remove-tabs
#- repo: https://github.com/pre-commit/mirrors-isort
# rev: v5.10.1
# hooks:
# - id: isort
# name: isort (python)
# entry: isort
# language: python
# types: [python]
3 changes: 3 additions & 0 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,6 @@ moto = "~=4.1"
datacompy = "~=0.8"
docker = "~=6.1"
ecs_logging = "~=2.0"
# flask libraries required for moto_server
flask = "~=2.0"
flask-cors = "~=3.0"
3 changes: 3 additions & 0 deletions config/develop/namespaced/glue-workflow.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ dependencies:
- develop/namespaced/glue-job-JSONToParquet.yaml
- develop/namespaced/glue-job-compare-parquet.yaml
- develop/glue-job-role.yaml
- develop/s3-cloudformation-bucket.yaml
parameters:
Namespace: {{ stack_group_config.namespace }}
JsonBucketName: {{ stack_group_config.intermediate_bucket_name }}
Expand All @@ -16,6 +17,8 @@ parameters:
S3ToJsonJobName: !stack_output_external "{{ stack_group_config.namespace }}-glue-job-S3ToJsonS3::JobName"
CompareParquetStagingNamespace: {{ stack_group_config.namespace }}
CompareParquetMainNamespace: "main"
S3SourceBucketName: {{ stack_group_config.input_bucket_name }}
CloudformationBucketName: {{ stack_group_config.template_bucket_name }}
stack_tags:
{{ stack_group_config.default_stack_tags }}
sceptre_user_data:
Expand Down
3 changes: 3 additions & 0 deletions config/prod/namespaced/glue-workflow.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ dependencies:
- prod/namespaced/glue-job-JSONToParquet.yaml
- prod/namespaced/glue-job-compare-parquet.yaml
- prod/glue-job-role.yaml
- prod/s3-cloudformation-bucket.yaml
parameters:
Namespace: {{ stack_group_config.namespace }}
JsonBucketName: {{ stack_group_config.intermediate_bucket_name }}
Expand All @@ -16,6 +17,8 @@ parameters:
S3ToJsonJobName: !stack_output_external "{{ stack_group_config.namespace }}-glue-job-S3ToJsonS3::JobName"
CompareParquetStagingNamespace: "staging"
CompareParquetMainNamespace: "main"
S3SourceBucketName: {{ stack_group_config.input_bucket_name }}
CloudformationBucketName: {{ stack_group_config.template_bucket_name }}
stack_tags:
{{ stack_group_config.default_stack_tags }}
sceptre_user_data:
Expand Down
Loading

0 comments on commit 1deebfc

Please sign in to comment.