You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we have certain error values like 999999 for numerical values and "9999-09-09" for date values that cannot be converted by our second script.
We need this information to signal where the conversion of the original data failed and thus we might have a data quality issue that needs further investigation.
However, when working with the data in Looker studio or when exporting the data, we don't want these values because they need to be filtered out for further analysis or aggregations.
My current SQL code looks something like this which needs to be done for nearly every column:
CASE hba1c_updated_date
WHEN "9999-09-09" THEN NULL
ELSE hba1c_updated_date
END
AS hba1c_updated_date,
We need a better approach for this. I think we have several options here, so we first need a sound concept.
Do it in R after the final pipeline step, maybe split the tables in something like _raw with error values and the final tables without
Do it in R and create a relational representation where we have the data without errors and a big error table that contains the information about which row and which columns had error values
Do it in SQL/Google?
Just some ideas
The text was updated successfully, but these errors were encountered:
Currently we have certain error values like
999999
for numerical values and"9999-09-09"
for date values that cannot be converted by our second script.We need this information to signal where the conversion of the original data failed and thus we might have a data quality issue that needs further investigation.
However, when working with the data in Looker studio or when exporting the data, we don't want these values because they need to be filtered out for further analysis or aggregations.
My current SQL code looks something like this which needs to be done for nearly every column:
We need a better approach for this. I think we have several options here, so we first need a sound concept.
_raw
with error values and the final tables withoutJust some ideas
The text was updated successfully, but these errors were encountered: