Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-3654][VL]Fix the duplicated key exception in TopN #3655

Merged
merged 1 commit into from
Nov 10, 2023

Conversation

JkSelf
Copy link
Contributor

@JkSelf JkSelf commented Nov 9, 2023

What changes were proposed in this pull request?

The TakeOrderedAndProjectExec will be transformed into either a sort + limit or directly into a limit operation. If it is a sort + limit case, Velox will convert it into a TopNNode. Here, we need to do some validation on the TopNNode.

How was this patch tested?

Pass Jenkins test

Copy link

github-actions bot commented Nov 9, 2023

#3654

Copy link

github-actions bot commented Nov 9, 2023

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Nov 9, 2023

Run Gluten Clickhouse CI

@JkSelf JkSelf requested a review from rui-mo November 9, 2023 09:22
@JkSelf
Copy link
Contributor Author

JkSelf commented Nov 9, 2023

@rui-mo Passed Jenkins TPC-DS test. http://sr168:8080/view/Gluten/job/Gluten_TPCDS_test/874/

@zhouyuan
Copy link
Contributor

zhouyuan commented Nov 9, 2023

@JkSelf it looks like the tpcds test in GHA is fine, is this due to the difference on scale factor?

-yuan

@JkSelf
Copy link
Contributor Author

JkSelf commented Nov 9, 2023

@JkSelf it looks like the tpcds test in GHA is fine, is this due to the difference on scale factor?

-yuan

@zhouyuan It seems that this issue is associated with the type of Spark application submission. I can reproduce this problem using spark-shell with a data scale of 1GB.

@JkSelf JkSelf merged commit 3ecf596 into apache:main Nov 10, 2023
16 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3655_time.csv log/native_master_11_09_2023_745ecb383_time.csv difference percentage
q1 33.93 33.80 -0.128 99.62%
q2 24.71 24.95 0.236 100.95%
q3 40.03 38.87 -1.162 97.10%
q4 35.38 36.96 1.579 104.46%
q5 69.43 69.94 0.514 100.74%
q6 7.91 7.87 -0.036 99.54%
q7 84.98 85.40 0.421 100.49%
q8 88.07 88.61 0.543 100.62%
q9 122.43 121.14 -1.294 98.94%
q10 47.55 54.35 6.800 114.30%
q11 19.99 19.52 -0.476 97.62%
q12 24.81 24.93 0.121 100.49%
q13 48.91 50.92 2.010 104.11%
q14 16.78 18.17 1.387 108.27%
q15 30.26 30.79 0.534 101.77%
q16 15.97 16.06 0.093 100.59%
q17 104.07 101.51 -2.557 97.54%
q18 149.32 148.23 -1.086 99.27%
q19 14.85 14.95 0.102 100.69%
q20 30.61 31.24 0.623 102.03%
q21 223.74 223.30 -0.440 99.80%
q22 13.57 14.24 0.670 104.94%
total 1247.28 1255.74 8.454 100.68%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants