Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-6640] RelMdUniqueKeys grows exponentially when key columns are repeated in projections #4013

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

zabetak
Copy link
Contributor

@zabetak zabetak commented Oct 23, 2024

Avoid generating non-minimal/redundant key sets when computing the unique keys for columns that are repeated in the output.

…re repeated in projections

Avoid generating non-minimal/redundant key sets when computing the unique keys for columns that are repeated in the output.
Copy link
Member

@caicancai caicancai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

@soumyakanti3578
Copy link
Contributor

+1 on the request to improve documentation. Otherwise, this LGTM!

@zabetak
Copy link
Contributor Author

zabetak commented Oct 29, 2024

@soumyakanti3578 Can you please elaborate a bit more about the doc improvements that you would like to see? Thanks.

@soumyakanti3578
Copy link
Contributor

@zabetak It felt to me that it is a bit difficult to understand the documentation explaining the unique key combinations. But I agree with your comment above that this is not the right place to add more detailed documentation. So please ignore my comments regarding doc above. Thanks!

Copy link
Contributor

@julianhyde julianhyde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've added a test for minimality but I would go further - make sure every value returned by any RelMdUniqueKeys provider is minimal. I think the execution cost will be (pardon the pun) minimal.

Consider adding a method static ImmutableBitSet areMinimal(Iterable<ImmutableBitSet>) (or List<ImmutableBitSet> if it's more efficient). I have a feeling that the current implementation makes N^2 tests but an implementation could make N * (N - 1) / 2 tests (less than half as many).

@@ -978,6 +978,18 @@ public static boolean allContain(Collection<ImmutableBitSet> bitSets, int bit) {
return true;
}

/**
* Returns whether this is a minimal set with respect to the specified collection of bitSets.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example in the javadoc?

@zabetak
Copy link
Contributor Author

zabetak commented Oct 30, 2024

@julianhyde Putting minimality logic on every return of the RelMdUniqueKeys handler doesn't feel right to me. Even if the overhead is minimal why adding seemingly redundant code?

Moreover, if the handler goes rogue and starts to generate not minimal keys at some place then chances are that we are going to fail before even arriving to the minimality check/filter.

I don't mind adding the checks/fiters if you feel strongly about it but I see more cons than pros in this approach.

For the record, we already have RelMdUniqueKeys#filterSupersets that is currently used by the Aggregate handler to ensure that keys are minimal. If decide to apply the filter in every other handler then I guess we don't need another method in ImmutableBitSet and probably don't need the minimality check in the tests either.

…inimality in tests"

The codebase has already "minimization" code in various other places (e.g., RelMdUniqueKeys#filterSupersets) and there is overlapping functionality with the new isMinimal API.
Based on these findings, I don't think a new API is necessary thus I reverted the respective code changes from here.

If we want to add a new public API its better to refactor/use the existing "minimization" logic.
However, the refactoring will require some discussion on where to move the code and how we want the API to look like so it is out of the scope of this PR.
@zabetak
Copy link
Contributor Author

zabetak commented Nov 21, 2024

The codebase has already "minimization" code in various other places (e.g., RelMdUniqueKeys#filterSupersets) and there is overlapping functionality with the new ImmutableBitSet#isMinimal API.
Based on these findings, I don't think a new API is necessary thus I reverted the respective code changes from.

If we want to add a new public API its better to refactor/use the existing "minimization" logic.
However, the refactoring will require some discussion on where to move the code and how we want the API to look like so it is out of the scope of this PR.

If there are no objections I plan to merge this PR as is in ~24hrs. If people feel that we need more work on the minimization checks/logic let me know and we can discuss this under a dedicated JIRA.

Copy link

sonarcloud bot commented Nov 21, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants