[BUG] query rule is skipped #44

holdenk · 2023-10-03T02:25:00Z

Describe the bug
A clear and concise description of what the bug is.

To Reproduce

Run the same code as #43

Expected behavior

Expect rule to run and fail

Screenshots

See 1:47 of https://www.youtube.com/watch?v=bNvvPKv-dmQ

Desktop (please complete the following information):
1.0

Additional context
N/A

phanikumarvemuri · 2023-10-10T09:49:56Z

I will work on it

phanikumarvemuri · 2023-10-11T16:39:53Z

@holdenk When i looked into the rules you used, I see that there is no row_dq rule for the table local.3rd_fake and query_dq is only enabled only for target_dq_validation. For final_query_dq to run there has to be a row_dq rule or enable_for_source_dq_validation should be set to true. The stats also confirms that row_dq is Skipped and final_query_dq is skipped.

The sequence of execution is
source_agg_dq
source_query_dq
row_dq
final_agg_dq only if row_dq is enabled
final_query_dq only if row_dq is enabled

spark-expectations/spark_expectations/core/expectations.py

Line 436 in ca1ceb5

if _row_dq is True and _target_query_dq is True:

{"product_id": "pay", "table_name": "local.fake_table_name", "rule_type": "row_dq", "rule": "bonus_checker", "column_name": "MaleBonusPercent", "expectation": "MaleBonusPercent > FemaleBonusPercent", "action_if_failed": "drop", "tag": "", "description": "Sample rule that the male bonuses should be higher. Thankfully this fails (but could be lower base pay etc.)", "enable_for_source_dq_validation": true, "enable_for_target_dq_validation": false, "is_active": true, "enable_error_drop_alert": true, "error_drop_threshold": 1}
{"product_id": "pay", "table_name": "local.3rd_fake", "rule_type": "query_dq", "rule": "history", "column_name": "Bloop", "expectation": "(select count(*) from 3rd_fake_view) < (select input_count from local.dq_stats WHERE table_name='local.3rd_fake')", "action_if_failed": "fail", "tag": "", "description": "We should always have more records than before", "enable_for_source_dq_validation": false, "enable_for_target_dq_validation": true, "is_active": true, "enable_error_drop_alert": true, "error_drop_threshold": 1}

holdenk · 2023-10-11T16:41:12Z

Ah interesting so the aggregate data quality checks are only run if there are row data quality checks present?

phanikumarvemuri · 2023-10-11T16:45:00Z

both agg and query dq rule types can be run on source and target. Here the source is input dataframe and target is data filtered/dropped after running rules of type row_dq

In the above example the enable_for_source_dq_validation is set to false and enable_for_target_dq_validation is set to true source validation is skipped because of the flag enable_for_source_dq_validation and target is skipped because there is no row_dq rule

phanikumarvemuri · 2023-10-13T01:07:49Z

@holdenk Please let us know if this is good to be closed ?
cc @asingamaneni

holdenk · 2023-10-14T01:08:12Z

I mean you can, although I think that aggregation rules only running if there is a row dq of the same type is something that I did not understand from the docs so I think it would be good to: always run them or update the docs and produce an warning (or error) when encountering a rule which is enabled but will not run.

newfront · 2023-10-14T02:23:12Z

Late to the party here. It feels like a dry-run option could be an interesting pre-integration test to say "no rules table or x for table|stats|x while this will not prevent spark-expectations from running. The benefits of x means y."

I am all for docs where they are applicable but getting warnings or short circuiting in the case of missing definitions would be good for the project too so that a user can do something like:

initialize spark-expectations
bootstrap the dq tables - empty if nothing else
run pre-flight. "scanning. zero rules for x (table|view)". Suggestions: (scanned (delta|iceberg|x) table - structure - (table.schema....) - recommended starting rule (x) -> do you want to apply non_null to the columns on table (x)...

I always like being able to fumble and get the package to identify my mistakes. Spark does this nicely with AnalysisExceptions. Delta does similar also with Analysis exceptions. Maybe there is a nice pattern for minimal viable rules with warnings (...)

phanikumarvemuri · 2023-10-16T11:28:07Z

Agree with both the options.
I will close this issue and will open a feature request to support dryRun option and also to update documentation and add flow chart wherever required
Thanks

asingamaneni · 2023-10-17T06:10:44Z

Dryrun is an awesome thought. Let’s implement it and also update the documentation where necessary!

phanikumarvemuri · 2023-11-02T14:03:49Z

@asingamaneni Can you close this issue. I will open new one for the above features

asingamaneni · 2023-11-07T21:23:28Z

Closing this issue as we will create a new feature request from the above discussion. @phanikumarvemuri please tag this issue when you create a new feature, so that we will know the source for the feature creation.

holdenk added the bug Something isn't working label Oct 3, 2023

asingamaneni assigned phanikumarvemuri Oct 10, 2023

asingamaneni closed this as completed Nov 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] query rule is skipped #44

[BUG] query rule is skipped #44

holdenk commented Oct 3, 2023

phanikumarvemuri commented Oct 10, 2023

phanikumarvemuri commented Oct 11, 2023

holdenk commented Oct 11, 2023

phanikumarvemuri commented Oct 11, 2023 •

edited

Loading

phanikumarvemuri commented Oct 13, 2023

holdenk commented Oct 14, 2023

newfront commented Oct 14, 2023

phanikumarvemuri commented Oct 16, 2023

asingamaneni commented Oct 17, 2023

phanikumarvemuri commented Nov 2, 2023

asingamaneni commented Nov 7, 2023

[BUG] query rule is skipped #44

[BUG] query rule is skipped #44

Comments

holdenk commented Oct 3, 2023

phanikumarvemuri commented Oct 10, 2023

phanikumarvemuri commented Oct 11, 2023

holdenk commented Oct 11, 2023

phanikumarvemuri commented Oct 11, 2023 • edited Loading

phanikumarvemuri commented Oct 13, 2023

holdenk commented Oct 14, 2023

newfront commented Oct 14, 2023

phanikumarvemuri commented Oct 16, 2023

asingamaneni commented Oct 17, 2023

phanikumarvemuri commented Nov 2, 2023

asingamaneni commented Nov 7, 2023

phanikumarvemuri commented Oct 11, 2023 •

edited

Loading