Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-5852] [CH] fix mismatch result columns size exception #5853

Merged
merged 9 commits into from
May 29, 2024

Conversation

shuai-xu
Copy link
Contributor

What changes were proposed in this pull request?

For sql whose agg columns contains two same const value, they will be transfromed to same name, In #5619 , it called distinct on agg's keys and outputs, so the const columns may be removed.

(Fixes: #5852)

How was this patch tested?

This patch was tested by unit tests.

Copy link

#5852

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

@zzcclp
Copy link
Contributor

zzcclp commented May 24, 2024

@ulysses-you @liujiayi771 please help to review, thanks.

@liujiayi771
Copy link
Contributor

liujiayi771 commented May 24, 2024

@shuai-xu I currently don't have the env for CH, so I used Velox to test the case you provided. The result is consistent with vanilla Spark. Does this issue only occur in CH? Could you tell me what CH returns before the fix? I also tried debugging and did not enter into the code you modified.

In addition, the original logic here is to reuse some duplicated pulled-out Attributes. If the issue you encountered is only related to Literal, could you only add a special handling for Literal here?

Copy link

Run Gluten Clickhouse CI

@shuai-xu
Copy link
Contributor Author

@shuai-xu I currently don't have the env for CH, so I used Velox to test the case you provided. The result is consistent with vanilla Spark. Does this issue only occur in CH? Could you tell me what CH returns before the fix? I also tried debugging and did not enter into the code you modified.

In addition, the original logic here is to reuse some duplicated pulled-out Attributes. If the issue you encountered is only related to Literal, could you only add a special handling for Literal here?

This case only happens in CH. It will throw an exception as shown in the issue before fix. The problem here is that the resue logic consider two different attibutes with same name as same one, so later in CH, it can't know whether they are really same.

@zhanglistar zhanglistar requested a review from zzcclp May 27, 2024 06:38
@ulysses-you
Copy link
Contributor

ulysses-you commented May 27, 2024

The literal should not happen in group keys. Can we add RemoveLiteralFromGroupExpressions into extended optimzier rule ? It seems Spark only optimize the group by case but miss to optimize the distinct case.

It seems we can not remove it since there is an alias wrap literal..

ulysses-you
ulysses-you previously approved these changes May 27, 2024
zzcclp
zzcclp previously approved these changes May 27, 2024
Copy link
Contributor

@zzcclp zzcclp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

liujiayi771
liujiayi771 previously approved these changes May 27, 2024
@shuai-xu shuai-xu dismissed stale reviews from liujiayi771, zzcclp, and ulysses-you via 07c1af3 May 28, 2024 01:56
Copy link

Run Gluten Clickhouse CI

2 similar comments
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

val preProject = ProjectExec(
eliminateProjectList(agg.child.outputSet, expressionMap.values.toSeq),
agg.child)
// ISSUE-5852: literals with same names are lost in expressionMap, need addback to Project
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the modification of the logic in this code unnecessary in your earliest version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the literals is not in the project outputs, it will case fallback.

Copy link

Run Gluten Clickhouse CI

liujiayi771
liujiayi771 previously approved these changes May 29, 2024
@liujiayi771
Copy link
Contributor

LGTM. cc @ulysses-you Do you have any other suggestions?

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@ulysses-you ulysses-you merged commit f10d3b7 into apache:main May 29, 2024
40 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_5853_time.csv log/native_master_05_28_2024_d7374bd8f_time.csv difference percentage
q1 34.22 34.04 -0.180 99.47%
q2 23.49 23.62 0.138 100.59%
q3 36.89 36.93 0.044 100.12%
q4 32.94 32.10 -0.842 97.45%
q5 70.85 69.91 -0.945 98.67%
q6 7.15 5.97 -1.188 83.40%
q7 83.94 80.25 -3.682 95.61%
q8 84.35 84.79 0.438 100.52%
q9 117.06 121.55 4.489 103.84%
q10 43.06 43.64 0.581 101.35%
q11 20.58 22.27 1.693 108.23%
q12 28.02 26.90 -1.116 96.02%
q13 51.13 52.40 1.277 102.50%
q14 19.24 21.59 2.348 112.20%
q15 30.74 31.46 0.716 102.33%
q16 14.05 13.97 -0.081 99.42%
q17 105.26 102.83 -2.433 97.69%
q18 143.59 145.92 2.329 101.62%
q19 13.55 14.82 1.265 109.33%
q20 26.64 31.06 4.421 116.60%
q21 260.72 261.69 0.965 100.37%
q22 14.92 12.27 -2.649 82.24%
total 1262.39 1269.97 7.589 100.60%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CH] throw GlutenException: Missmatch result columns size
5 participants