Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Validate unsupported rewrite string in Re2 #6312

Closed
wants to merge 3 commits into from

Conversation

kecookier
Copy link
Contributor

What changes were proposed in this pull request?

Now we only validate the regex pattern, but we also need to validate the rewrite string. The regexp_function will call RE2::GlobalReplace(), which will swallow the errors thrown by RE2::Rewrite().

When RE2::CheckRewriteString() fails, Gluten will fallback to vanilla and print a log like:

24/07/02 22:21:38 INFO GlutenFallbackReporter: Validation failed for plan: Project, due to: native check failure:native validation failed for function: regexp_replace due to: Rewrite \[check failed in RE2. Reason: Rewrite schema error: '\' must be followed by a digit or '\'..
24/07/02 22:21:38 INFO GlutenFallbackReporter: appId=local-c0csj_b8MAcjJdWfT2zfiQ, containerId=null, jsonStr={"plan":"Project","reason":"native check failure:native validation failed for function: regexp_replace due to: Rewrite \\[check failed in RE2. Reason: Rewrite schema error: '\\' must be followed by a digit or '\\'."}

(Fixes: #6224)

How was this patch tested?

Exist CI.

Copy link

github-actions bot commented Jul 2, 2024

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@kecookier kecookier changed the title [VL] Validate unsupported rewrite string in Re2 WIP-[VL] Validate unsupported rewrite string in Re2 Jul 3, 2024
@FelixYBW FelixYBW changed the title WIP-[VL] Validate unsupported rewrite string in Re2 [VL][WIP] Validate unsupported rewrite string in Re2 Jul 3, 2024
@FelixYBW FelixYBW requested a review from PHILO-HE July 3, 2024 02:08
@kecookier kecookier changed the title [VL][WIP] Validate unsupported rewrite string in Re2 [VL] Validate unsupported rewrite string in Re2 Jul 3, 2024
@kecookier kecookier force-pushed the wip-fix-regexp_replace branch from 9128a2a to 642ed1d Compare July 3, 2024 07:54
@zhouyuan
Copy link
Contributor

zhouyuan commented Jul 3, 2024

@GlutenPerfBot benchmark

@GlutenPerfBot
Copy link
Contributor

ACK, will benchmark TPCH/DS on this pull request

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_6312_time.csv log/native_master_07_02_2024_6b6444e57e_time.csv difference percentage
q1 36.08 34.26 -1.817 94.96%
q2 24.09 22.32 -1.765 92.67%
q3 41.14 40.70 -0.433 98.95%
q4 33.63 33.11 -0.519 98.46%
q5 70.87 69.66 -1.216 98.28%
q6 6.58 7.95 1.373 120.88%
q7 81.31 83.02 1.710 102.10%
q8 82.29 84.00 1.714 102.08%
q9 124.70 122.20 -2.503 97.99%
q10 45.16 47.32 2.166 104.80%
q11 19.96 20.46 0.497 102.49%
q12 24.98 27.16 2.180 108.73%
q13 39.42 39.74 0.320 100.81%
q14 18.45 19.78 1.328 107.20%
q15 32.81 30.60 -2.208 93.27%
q16 14.01 14.01 0.004 100.03%
q17 104.91 102.42 -2.490 97.63%
q18 150.24 151.18 0.939 100.63%
q19 13.71 14.80 1.085 107.91%
q20 26.73 31.05 4.322 116.17%
q21 260.91 264.02 3.107 101.19%
q22 13.32 12.38 -0.939 92.95%
total 1265.30 1272.16 6.857 100.54%

@kecookier kecookier force-pushed the wip-fix-regexp_replace branch from 642ed1d to e447192 Compare July 3, 2024 13:55
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCDS SF2000 with Velox backend, for reference only ====

query log/native_6312_time.csv log/native_master_07_02_2024_6b6444e57_time.csv difference percentage
q1 14.50 14.94 0.436 103.01%
q2 14.76 14.59 -0.174 98.82%
q3 6.02 3.99 -2.038 66.17%
q4 63.14 64.79 1.650 102.61%
q5 6.37 8.99 2.622 141.18%
q6 3.58 2.43 -1.151 67.83%
q7 4.37 4.25 -0.120 97.26%
q8 4.55 4.94 0.396 108.72%
q9 19.46 18.15 -1.303 93.30%
q10 11.44 10.50 -0.933 91.84%
q11 35.39 37.08 1.684 104.76%
q12 1.46 2.38 0.920 163.00%
q13 6.60 5.72 -0.880 86.68%
q14a 41.18 43.96 2.775 106.74%
q14b 38.72 39.09 0.372 100.96%
q15 3.64 2.68 -0.954 73.76%
q16 40.17 41.98 1.809 104.50%
q17 4.89 5.93 1.037 121.18%
q18 6.39 6.27 -0.112 98.24%
q19 2.23 2.30 0.069 103.11%
q20 1.44 1.33 -0.105 92.67%
q21 1.02 1.10 0.082 108.06%
q22 8.27 8.35 0.075 100.90%
q23a 80.21 84.32 4.115 105.13%
q23b 99.88 103.47 3.591 103.60%
q24a 79.77 78.77 -0.998 98.75%
q24b 80.58 72.76 -7.816 90.30%
q25 4.31 4.39 0.077 101.79%
q26 4.16 2.96 -1.193 71.30%
q27 3.55 3.41 -0.141 96.03%
q28 21.08 21.17 0.085 100.40%
q29 6.66 7.08 0.419 106.29%
q30 9.59 4.09 -5.498 42.67%
q31 6.21 6.30 0.093 101.50%
q32 1.14 1.23 0.093 108.12%
q33 7.35 4.72 -2.628 64.23%
q34 5.89 6.86 0.977 116.59%
q35 7.72 7.65 -0.069 99.11%
q36 3.75 3.67 -0.082 97.81%
q37 4.08 4.66 0.577 114.15%
q38 11.94 14.27 2.329 119.51%
q39a 3.36 3.53 0.163 104.85%
q39b 3.10 2.90 -0.199 93.59%
q40 3.67 3.69 0.021 100.57%
q41 0.62 0.70 0.078 112.56%
q42 0.93 1.08 0.143 115.30%
q43 3.89 4.02 0.130 103.35%
q44 12.28 8.66 -3.617 70.54%
q45 3.35 8.26 4.915 246.79%
q46 3.39 3.48 0.088 102.60%
q47 14.20 14.36 0.157 101.10%
q48 4.26 4.60 0.335 107.86%
q49 9.41 9.33 -0.080 99.15%
q50 19.74 22.19 2.446 112.39%
q51 8.63 11.70 3.071 135.59%
q52 1.00 1.09 0.090 109.05%
q53 2.17 2.02 -0.156 92.81%
q54 3.27 3.32 0.043 101.31%
q55 1.01 1.16 0.144 114.22%
q56 4.38 4.58 0.198 104.51%
q57 8.55 8.80 0.246 102.88%
q58 2.57 2.67 0.106 104.14%
q59 13.72 13.99 0.266 101.94%
q60 4.83 4.89 0.054 101.11%
q61 5.58 5.49 -0.087 98.44%
q62 3.75 5.15 1.397 137.23%
q63 2.11 2.21 0.099 104.70%
q64 51.86 51.58 -0.284 99.45%
q65 13.56 14.11 0.552 104.07%
q66 8.71 4.75 -3.960 54.54%
q67 349.08 349.82 0.736 100.21%
q68 3.61 3.67 0.062 101.71%
q69 6.25 6.44 0.189 103.03%
q70 8.82 8.98 0.163 101.84%
q71 3.43 3.30 -0.131 96.18%
q72 184.79 187.54 2.745 101.49%
q73 2.29 2.34 0.053 102.31%
q74 21.04 21.84 0.800 103.80%
q75 25.76 23.36 -2.406 90.66%
q76 9.56 9.49 -0.063 99.34%
q77 2.46 2.15 -0.314 87.25%
q78 38.55 38.92 0.373 100.97%
q79 3.60 3.62 0.022 100.60%
q80 11.66 11.04 -0.622 94.67%
q81 5.33 5.17 -0.164 96.93%
q82 6.42 6.63 0.208 103.25%
q83 1.60 1.58 -0.016 99.02%
q84 2.80 2.81 0.009 100.32%
q85 7.03 7.01 -0.020 99.71%
q86 3.17 3.38 0.205 106.46%
q87 12.08 12.61 0.533 104.42%
q88 24.48 25.25 0.772 103.15%
q89 5.83 3.21 -2.619 55.06%
q90 3.90 3.86 -0.045 98.84%
q91 2.66 2.54 -0.117 95.59%
q92 1.35 1.32 -0.030 97.81%
q93 28.01 29.02 1.010 103.61%
q94 21.85 22.00 0.150 100.69%
q9 81.17 81.02 -0.150 99.82%
q5 3.96 3.83 -0.130 96.70%
q96 11.97 12.31 0.337 102.82%
q97 1.89 2.06 0.171 109.03%
q98 11.78 11.46 -0.326 97.24%
q99 11.78 11.46 -0.326 97.24%
total 1903.57 1911.40 7.832 100.41%

const auto& rewriteArg = scalarFunction.arguments()[2].value();
if (!rewriteArg.has_literal() || !rewriteArg.literal().has_string()) {
LOG_VALIDATION_MSG("Rewrite is not string literal for " + name);
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check will make this function always fall back if non-constant replacement string is used.

Please check whether this comment makes sense: #6224 (comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation. I will modify the code according to your suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kecookier, if the proposal is feasible, you can file a drafted pr in velox to see whether the community accepts that. Thanks!

@GlutenPerfBot
Copy link
Contributor

ACK, will benchmark TPCH/DS on this pull request

1 similar comment
@GlutenPerfBot
Copy link
Contributor

ACK, will benchmark TPCH/DS on this pull request

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCDS SF2000 with Velox backend, for reference only ====

query log/native_6312_time.csv log/native_master_07_02_2024_6b6444e57_time.csv difference percentage
q1 15.73 14.94 -0.787 94.99%
q2 14.91 14.59 -0.324 97.83%
q3 4.14 3.99 -0.153 96.30%
q4 62.86 64.79 1.933 103.08%
q5 7.30 8.99 1.692 123.18%
q6 3.91 2.43 -1.489 61.97%
q7 4.24 4.25 0.010 100.23%
q8 5.66 4.94 -0.714 87.37%
q9 23.23 18.15 -5.072 78.16%
q10 10.99 10.50 -0.489 95.55%
q11 35.10 37.08 1.977 105.63%
q12 1.58 2.38 0.802 150.86%
q13 7.75 5.72 -2.030 73.82%
q14a 41.60 43.96 2.356 105.66%
q14b 39.92 39.09 -0.830 97.92%
q15 2.52 2.68 0.164 106.51%
q16 39.61 41.98 2.371 105.99%
q17 5.60 5.93 0.331 105.91%
q18 7.25 6.27 -0.973 86.57%
q19 3.19 2.30 -0.890 72.09%
q20 1.37 1.33 -0.038 97.21%
q21 2.72 1.10 -1.624 40.40%
q22 8.47 8.35 -0.123 98.55%
q23a 82.25 84.32 2.070 102.52%
q23b 102.34 103.47 1.133 101.11%
q24a 73.68 78.77 5.089 106.91%
q24b 69.35 72.76 3.411 104.92%
q25 9.56 4.39 -5.175 45.87%
q26 2.97 2.96 -0.007 99.77%
q27 2.90 3.41 0.512 117.63%
q28 23.22 21.17 -2.051 91.17%
q29 6.57 7.08 0.512 107.79%
q30 4.06 4.09 0.033 100.80%
q31 6.12 6.30 0.175 102.87%
q32 1.15 1.23 0.083 107.18%
q33 4.85 4.72 -0.132 97.29%
q34 3.62 6.86 3.245 189.73%
q35 6.33 7.65 1.322 120.87%
q36 3.29 3.67 0.378 111.46%
q37 3.61 4.66 1.044 128.87%
q38 11.38 14.27 2.889 125.40%
q39a 3.34 3.53 0.189 105.66%
q39b 4.70 2.90 -1.798 61.72%
q40 3.84 3.69 -0.153 96.01%
q41 1.93 0.70 -1.229 36.24%
q42 0.98 1.08 0.099 110.09%
q43 3.91 4.02 0.112 102.86%
q44 8.89 8.66 -0.226 97.45%
q45 3.44 8.26 4.818 239.87%
q46 3.20 3.48 0.279 108.70%
q47 14.26 14.36 0.102 100.71%
q48 4.51 4.60 0.086 101.90%
q49 9.63 9.33 -0.301 96.88%
q50 23.64 22.19 -1.452 93.86%
q51 8.61 11.70 3.090 135.88%
q52 1.03 1.09 0.060 105.83%
q53 2.16 2.02 -0.142 93.42%
q54 3.49 3.32 -0.174 95.02%
q55 1.02 1.16 0.137 113.47%
q56 4.45 4.58 0.130 102.92%
q57 8.56 8.80 0.246 102.87%
q58 2.52 2.67 0.156 106.19%
q59 17.67 13.99 -3.684 79.16%
q60 5.11 4.89 -0.219 95.71%
q61 5.60 5.49 -0.109 98.06%
q62 4.55 5.15 0.594 113.05%
q63 2.22 2.21 -0.005 99.77%
q64 48.06 51.58 3.523 107.33%
q65 13.68 14.11 0.428 103.13%
q66 3.46 4.75 1.290 137.29%
q67 351.80 349.82 -1.981 99.44%
q68 3.74 3.67 -0.071 98.09%
q69 6.23 6.44 0.210 103.38%
q70 13.46 8.98 -4.474 66.75%
q71 4.70 3.30 -1.403 70.13%
q72 185.21 187.54 2.332 101.26%
q73 4.00 2.34 -1.654 58.63%
q74 21.32 21.84 0.521 102.44%
q75 23.46 23.36 -0.099 99.58%
q76 9.17 9.49 0.323 103.53%
q77 2.19 2.15 -0.045 97.93%
q78 42.29 38.92 -3.374 92.02%
q79 3.66 3.62 -0.036 99.02%
q80 11.46 11.04 -0.419 96.35%
q81 5.10 5.17 0.070 101.37%
q82 9.05 6.63 -2.414 73.31%
q83 1.51 1.58 0.080 105.28%
q84 2.76 2.81 0.051 101.83%
q85 6.86 7.01 0.149 102.17%
q86 4.22 3.38 -0.843 80.01%
q87 14.24 12.61 -1.631 88.55%
q88 24.16 25.25 1.097 104.54%
q89 3.21 3.21 0.002 100.07%
q90 8.54 3.86 -4.682 45.16%
q91 2.54 2.54 -0.001 99.97%
q92 1.38 1.32 -0.057 95.85%
q93 27.92 29.02 1.101 103.94%
q94 21.59 22.00 0.406 101.88%
q9 84.36 81.02 -3.340 96.04%
q5 3.85 3.83 -0.019 99.49%
q96 12.03 12.31 0.285 102.37%
q97 1.91 2.06 0.152 107.97%
q98 9.52 11.46 1.933 120.29%
q99 9.52 11.46 1.933 120.29%
total 1912.76 1911.40 -1.357 99.93%

@GlutenPerfBot
Copy link
Contributor

ACK, will benchmark TPCH/DS on this pull request

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_6312_time.csv log/native_master_07_04_2024_ff0b4733a_time.csv difference percentage
q1 34.70 37.25 2.549 107.35%
q2 29.06 23.94 -5.122 82.38%
q3 38.82 40.88 2.065 105.32%
q4 35.88 32.31 -3.565 90.06%
q5 70.95 69.58 -1.370 98.07%
q6 7.78 8.08 0.299 103.85%
q7 86.65 84.15 -2.499 97.12%
q8 85.17 86.08 0.909 101.07%
q9 121.40 122.79 1.387 101.14%
q10 45.18 46.05 0.870 101.93%
q11 21.86 20.55 -1.314 93.99%
q12 25.89 27.84 1.954 107.55%
q13 39.27 39.73 0.461 101.17%
q14 20.12 18.88 -1.235 93.86%
q15 32.95 33.90 0.946 102.87%
q16 14.16 13.35 -0.809 94.29%
q17 104.77 105.50 0.733 100.70%
q18 147.18 149.41 2.230 101.52%
q19 14.61 13.77 -0.836 94.27%
q20 27.00 30.34 3.336 112.36%
q21 264.36 263.79 -0.566 99.79%
q22 14.24 12.38 -1.863 86.92%
total 1281.99 1280.56 -1.438 99.89%

Copy link

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale stale label Aug 21, 2024
Copy link

This PR was auto-closed because it has been stalled for 10 days with no activity. Please feel free to reopen if it is still valid. Thanks.

@github-actions github-actions bot closed this Aug 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL] Results are mismatch with vanilla Spark when use regexp_replace('a{bc', '\\{', '\\[')
4 participants