-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ETL-653] Add corresponding deleted data type table for all HealthKit data types #116
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 LGTM! Ill defer to rixing to give a final review.
"HealthKitV2Samples", | ||
"HealthKitV2Statistics", | ||
"HealthKitV2Samples_Deleted", | ||
"HealthKitV2Statistics_Deleted" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add all the other HealthKit data columns here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, these are the only data types with a subtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just had a couple Qs/comments
Quality Gate passedIssues Measures |
We didn't have
*_Deleted
tables for some HealthKit data types, which caused L255 ofsrc/glue/jobs/json_to_parquet.py
to raise a caught exception rather than removing deleted records for that data type. This only affected theHealthKitV2Heartbeat
data type since we haven't received deleted records for any other HealthKit data type besidesHealthKitV2Samples
, which we already had a corresponding*_Deleted
table for.In addition to adding these new tables, I updated a few lines in
src/glue/jobs/s3_to_json.py
. In particular, the addition of theHealthKitV2Samples_Deleted
andHealthKitV2Statistics_Deleted
data types to theDATA_TYPES_WITH_SUBTYPE
has the effect of adding the subtype included in the JSON file name into the JSON itself as aType
field. This field is not relevant when dropping deleted records from the data type, but it is file metadata which we were previously ignoring and not including in the resulting JSON datasets.