Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Remove the registry for Velox's prestosql scalar functions #5202

Merged
merged 4 commits into from
Jun 27, 2024

Conversation

PHILO-HE
Copy link
Contributor

@PHILO-HE PHILO-HE commented Mar 29, 2024

What changes were proposed in this pull request?

Currently, Gluten registers Velox's prestosql scalar functions. Thus, some of them can be used for Spark SQL if there is no semantic difference. However, there are some conflict issues, which makes unexpected function calling. See below links. We plan to register used presto scalar functions (see below list) in sparksql registry.

#4759
#5150

Function list:
url_decode/url_encode
array_distinct
array_except
array_position
map_entries
map_keys
map_values
reverse
crc32
regexp_extract_all
like
abs (for long/short decimal)
asin
atan
cos
degrees

How was this patch tested?

Existing tests.

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@PHILO-HE PHILO-HE force-pushed the remove-presto-scalar-function-registry branch 7 times, most recently from ebc1e02 to 8b37162 Compare April 10, 2024 07:43
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_5202_time.csv log/native_master_04_09_2024_8e5f63c0c_time.csv difference percentage
q1 37.13 36.38 -0.750 97.98%
q2 23.82 23.87 0.045 100.19%
q3 37.27 36.63 -0.634 98.30%
q4 37.54 38.51 0.963 102.57%
q5 70.57 71.53 0.968 101.37%
q6 9.69 5.75 -3.939 59.35%
q7 81.58 85.53 3.950 104.84%
q8 86.21 85.59 -0.627 99.27%
q9 125.65 119.02 -6.626 94.73%
q10 47.97 43.75 -4.219 91.21%
q11 19.86 20.38 0.524 102.64%
q12 25.90 26.91 1.003 103.87%
q13 47.71 48.54 0.833 101.75%
q14 24.01 22.64 -1.373 94.28%
q15 31.92 31.84 -0.082 99.74%
q16 12.49 13.80 1.313 110.52%
q17 103.62 101.51 -2.105 97.97%
q18 142.55 143.19 0.640 100.45%
q19 15.68 13.58 -2.107 86.57%
q20 27.11 28.55 1.437 105.30%
q21 227.72 226.97 -0.753 99.67%
q22 13.85 13.92 0.069 100.50%
total 1249.85 1238.38 -11.469 99.08%

@Yohahaha
Copy link
Contributor

Hi, could you share the current status?

@PHILO-HE
Copy link
Contributor Author

Hi, could you share the current status?

@Yohahaha, this pr depends on a velox pr: facebookincubator/velox#9425. I will try to push the review progress.

@Yohahaha
Copy link
Contributor

Hi, could you share the current status?

@Yohahaha, this pr depends on a velox pr: facebookincubator/velox#9425. I will try to push the review progress.

why not choose below usage? I think velox has already support register with prefix, the only thing we need to do is choose which prefixed-function is gluten needed. correct me if it's wrong, thank you!

  velox::functions::prestosql::registerAllScalarFunctions("presto_");
  velox::functions::sparksql::registerFunctions("spark_");

  prestoMap("array_distinct" -> "presto");
  // do naming conversion in cpp module

@PHILO-HE
Copy link
Contributor Author

Hi, could you share the current status?

@Yohahaha, this pr depends on a velox pr: facebookincubator/velox#9425. I will try to push the review progress.

why not choose below usage? I think velox has already support register with prefix, the only thing we need to do is choose which prefixed-function is gluten needed. correct me if it's wrong, thank you!

  velox::functions::prestosql::registerAllScalarFunctions("presto_");
  velox::functions::sparksql::registerFunctions("spark_");

  prestoMap("array_distinct" -> "presto");
  // do naming conversion in cpp module

They are discussed internally. We feel it is more clear by using only one registry and we want to avoid maintaining such a map in Gluten.

Copy link

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale stale label May 27, 2024
@PHILO-HE PHILO-HE force-pushed the remove-presto-scalar-function-registry branch 2 times, most recently from 4e2e357 to a0bda39 Compare May 27, 2024 02:10
@PHILO-HE PHILO-HE force-pushed the remove-presto-scalar-function-registry branch from a0bda39 to efce94e Compare June 26, 2024 07:47
@PHILO-HE PHILO-HE removed the stale stale label Jun 26, 2024
@PHILO-HE PHILO-HE force-pushed the remove-presto-scalar-function-registry branch from 0f857d2 to 0567551 Compare June 26, 2024 11:43
@PHILO-HE PHILO-HE marked this pull request as ready for review June 26, 2024 14:18
@@ -67,19 +74,16 @@ void registerFunctionOverwrite() {
velox::exec::registerFunctionCallToSpecialForm(
kRowConstructorWithAllNull,
std::make_unique<RowConstructorWithNullCallToSpecialForm>(kRowConstructorWithAllNull));
velox::functions::sparksql::registerBitwiseFunctions("spark_");
velox::functions::registerBinaryIntegral<velox::functions::CheckedPlusFunction>({"check_add"});
velox::functions::registerBinaryIntegral<velox::functions::CheckedMinusFunction>({"check_subtract"});
velox::functions::registerBinaryIntegral<velox::functions::CheckedMultiplyFunction>({"check_multiply"});
velox::functions::registerBinaryIntegral<velox::functions::CheckedDivideFunction>({"check_divide"});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems some registrations in registerFunctionOverwrite() could be removed, including spark_rand, check_xx. After facebookincubator/velox@372abeb, we need to adapt to the new names checked_xx.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also noted this. Will create another pr to fix it.

Copy link
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@PHILO-HE PHILO-HE merged commit ac227de into apache:main Jun 27, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants