-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Proposed interface for physical plan invariant checking. #13986
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -110,6 +110,16 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync { | |||||||||||||||||||||||||||||||||||||||||||||||||
/// trait, which is implemented for all `ExecutionPlan`s. | ||||||||||||||||||||||||||||||||||||||||||||||||||
fn properties(&self) -> &PlanProperties; | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||
/// Returns an error if this individual node does not conform to its invariants. | ||||||||||||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps to take into account the different types of "executableness" we can use a similar enum as we did for LogicalPlans:
Then the signature might look like fn check_node_invariants(&self, invariant_level: InvariantLevel) -> Result<()>
Ok(())
} |
||||||||||||||||||||||||||||||||||||||||||||||||||
/// These invariants are typically only checked in debug mode. | ||||||||||||||||||||||||||||||||||||||||||||||||||
/// | ||||||||||||||||||||||||||||||||||||||||||||||||||
/// A default set of invariants is provided in the default implementation. | ||||||||||||||||||||||||||||||||||||||||||||||||||
/// Extension nodes can provide their own invariants. | ||||||||||||||||||||||||||||||||||||||||||||||||||
fn check_node_invariants(&self) -> Result<()> { | ||||||||||||||||||||||||||||||||||||||||||||||||||
// TODO | ||||||||||||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wasn't sure what should be the default set. The SanityCheckPlan does exactly what I had been thinking: datafusion/datafusion/core/src/physical_optimizer/sanity_checker.rs Lines 41 to 47 in 38ccb00
Also, I think this optimizer pass does not mutate anything and instead validates? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we change the SanityPlanChecker be an invariant checker instead, and then (a) run after the other optimizer rules are applied (current behavior) as well as (b) after each optimizer rule in debug mode -- would this be useful? The added debug mode check could help isolate when a user-defined optimizer rule extension, or a user defined ExecutionPlan node, does not work well with the DF upgrade (e.g. changes in DF plan nodes or optimizer rules). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Conceptually, sanity checking is a "more general" process -- it verifies that any two operators that exchange data (i.e. one's output feeds the other's input) are compatible. So I don't think we can "change" it to be an invariant checker, but we can extend it to also check "invariants" of each individual operator (however they are defined by an However, we can not blindly run sanity checking after every rule. Why? Because rules have the following types regarding their input/output plan validity:
As of this writing, we don't have a formal cut-off point in our list of rules whereafter plans remain valid, but I suspect they do after There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the logical planner we have a split between
It seems like maybe we could make the same separation for physical optimizer rules as well ("not yet executable") and ("read to execute"),
This was surprising to me (I am not doubting it). It looked at the other passes, and it seems there are a few others datafusion/datafusion/core/src/physical_optimizer/optimizer.rs Lines 56 to 72 in 264f4c5
🤔 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree with this sentiment. It seems to me that the "SanityChecker" is verifying invariants that should be true for all nodes (regardless of what they do -- for example that the declared required input sort is the same as the produced output sort) Thus, focusing on ExecutionPlan specific invariants might be a good first step. Some simple invariants to start with I could imagine are:
|
||||||||||||||||||||||||||||||||||||||||||||||||||
Ok(()) | ||||||||||||||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||
/// Specifies the data distribution requirements for all the | ||||||||||||||||||||||||||||||||||||||||||||||||||
/// children for this `ExecutionPlan`, By default it's [[Distribution::UnspecifiedDistribution]] for each child, | ||||||||||||||||||||||||||||||||||||||||||||||||||
fn required_input_distribution(&self) -> Vec<Distribution> { | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for API design we should not pass the rule to the invariant checker (as the checker shouldn't logically depend on the rule). Perhaps just the rule name could be passed in to help with debug messages