Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ETL-674] Add script to compress intermediate JSON #134

Merged
merged 1 commit into from
Aug 12, 2024
Merged

Conversation

philerooski
Copy link
Contributor

Recombines the NDJSON part sets which we wrote to our intermediate bucket and uploads a gzipped NDJSON to an adjacent location.

Data Input

Each part set is located at:

s3://<bucket-name>/<namespace>/json/dataset={dataset}/cohort={cohort}/

where {dataset} is the dataset identifier (datatype), and {cohort} is either 'adults_v1' or 'pediatric_v1'. Each file in the part set is named:

{file_identifier}.part{part_number}.ndjson

Data Output

Each combined and gzipped part set is written to:

s3://<bucket-name>/<namespace>/compressed_json/dataset={dataset}/cohort={cohort}/{file_identifier}.ndjson.gz

Copy link

@philerooski
Copy link
Contributor Author

@BryanFauble I amended my commit but it seems like SonarCloud ran its analysis on the previous version of the code again.

@BryanFauble
Copy link
Contributor

@BryanFauble I amended my commit but it seems like SonarCloud ran its analysis on the previous version of the code again.

I am not sure if SonarCloud would work properly with amended commits. I suspect that is needs a new commit hash for sonarcloud to run.

Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 LGTM!

@philerooski philerooski merged commit 1f47dfc into main Aug 12, 2024
17 checks passed
@philerooski philerooski deleted the etl-674 branch August 12, 2024 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants