-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] query rule is skipped #44
Comments
I will work on it |
@holdenk When i looked into the rules you used, I see that there is no The sequence of execution is
{"product_id": "pay", "table_name": "local.fake_table_name", "rule_type": "row_dq", "rule": "bonus_checker", "column_name": "MaleBonusPercent", "expectation": "MaleBonusPercent > FemaleBonusPercent", "action_if_failed": "drop", "tag": "", "description": "Sample rule that the male bonuses should be higher. Thankfully this fails (but could be lower base pay etc.)", "enable_for_source_dq_validation": true, "enable_for_target_dq_validation": false, "is_active": true, "enable_error_drop_alert": true, "error_drop_threshold": 1} |
Ah interesting so the aggregate data quality checks are only run if there are row data quality checks present? |
both agg and query dq rule types can be run on source and target. Here the source is input dataframe and target is data filtered/dropped after running rules of type row_dq In the above example the enable_for_source_dq_validation is set to false and enable_for_target_dq_validation is set to true source validation is skipped because of the flag |
@holdenk Please let us know if this is good to be closed ? |
I mean you can, although I think that aggregation rules only running if there is a row dq of the same type is something that I did not understand from the docs so I think it would be good to: always run them or update the docs and produce an warning (or error) when encountering a rule which is enabled but will not run. |
Late to the party here. It feels like a dry-run option could be an interesting pre-integration test to say "no rules table or x for table|stats|x while this will not prevent spark-expectations from running. The benefits of x means y." I am all for docs where they are applicable but getting warnings or short circuiting in the case of missing definitions would be good for the project too so that a user can do something like:
I always like being able to fumble and get the package to identify my mistakes. Spark does this nicely with AnalysisExceptions. Delta does similar also with Analysis exceptions. Maybe there is a nice pattern for minimal viable rules with warnings (...) |
Agree with both the options. |
Dryrun is an awesome thought. Let’s implement it and also update the documentation where necessary! |
@asingamaneni Can you close this issue. I will open new one for the above features |
Closing this issue as we will create a new feature request from the above discussion. @phanikumarvemuri please tag this issue when you create a new feature, so that we will know the source for the feature creation. |
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Run the same code as #43
Expected behavior
Expect rule to run and fail
Screenshots
See 1:47 of https://www.youtube.com/watch?v=bNvvPKv-dmQ
Desktop (please complete the following information):
1.0
Additional context
N/A
The text was updated successfully, but these errors were encountered: