-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-6640] RelMdUniqueKeys grows exponentially when key columns are repeated in projections #4013
base: main
Are you sure you want to change the base?
Conversation
…re repeated in projections Avoid generating non-minimal/redundant key sets when computing the unique keys for columns that are repeated in the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM
+1 on the request to improve documentation. Otherwise, this LGTM! |
@soumyakanti3578 Can you please elaborate a bit more about the doc improvements that you would like to see? Thanks. |
@zabetak It felt to me that it is a bit difficult to understand the documentation explaining the unique key combinations. But I agree with your comment above that this is not the right place to add more detailed documentation. So please ignore my comments regarding doc above. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've added a test for minimality but I would go further - make sure every value returned by any RelMdUniqueKeys provider is minimal. I think the execution cost will be (pardon the pun) minimal.
Consider adding a method static ImmutableBitSet areMinimal(Iterable<ImmutableBitSet>)
(or List<ImmutableBitSet>
if it's more efficient). I have a feeling that the current implementation makes N^2
tests but an implementation could make N * (N - 1) / 2
tests (less than half as many).
@@ -978,6 +978,18 @@ public static boolean allContain(Collection<ImmutableBitSet> bitSets, int bit) { | |||
return true; | |||
} | |||
|
|||
/** | |||
* Returns whether this is a minimal set with respect to the specified collection of bitSets. | |||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example in the javadoc?
@julianhyde Putting minimality logic on every return of the Moreover, if the handler goes rogue and starts to generate not minimal keys at some place then chances are that we are going to fail before even arriving to the minimality check/filter. I don't mind adding the checks/fiters if you feel strongly about it but I see more cons than pros in this approach. For the record, we already have RelMdUniqueKeys#filterSupersets that is currently used by the |
…inimality in tests" The codebase has already "minimization" code in various other places (e.g., RelMdUniqueKeys#filterSupersets) and there is overlapping functionality with the new isMinimal API. Based on these findings, I don't think a new API is necessary thus I reverted the respective code changes from here. If we want to add a new public API its better to refactor/use the existing "minimization" logic. However, the refactoring will require some discussion on where to move the code and how we want the API to look like so it is out of the scope of this PR.
The codebase has already "minimization" code in various other places (e.g., If we want to add a new public API its better to refactor/use the existing "minimization" logic. If there are no objections I plan to merge this PR as is in ~24hrs. If people feel that we need more work on the minimization checks/logic let me know and we can discuss this under a dedicated JIRA. |
Quality Gate passedIssues Measures |
Avoid generating non-minimal/redundant key sets when computing the unique keys for columns that are repeated in the output.