Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE] Fix fallback for spark sequence function with literal array data as input #6433

Merged

Conversation

gaoyangxiaozhu
Copy link
Contributor

@gaoyangxiaozhu gaoyangxiaozhu commented Jul 12, 2024

What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

Before this PR the below query would fail and fallback

spark.sql("SELECT sequence(1, 5), string_column from t").collect

with error

image

(Fixes: #ISSUE-ID)

How was this patch tested?

UT

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

@gaoyangxiaozhu gaoyangxiaozhu changed the title [WIP] Fix spark sequence function fallback issue if input is literal array data Fix spark sequence function fallback issue if input is literal array data Jul 12, 2024
@gaoyangxiaozhu
Copy link
Contributor Author

@PHILO-HE / @rui-mo help review

@gaoyangxiaozhu gaoyangxiaozhu force-pushed the gayangya/fix_sequence_fallback_issue branch from 93fc653 to 373c10b Compare July 13, 2024 01:15
Copy link

Run Gluten Clickhouse CI

@@ -664,6 +665,15 @@ class ScalarFunctionsValidateSuite extends FunctionsValidateTest {
}
}

test("Test sequence function") {
withSQLConf(("spark.sql.optimizer.excludedRules", NullPropagation.ruleName)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why this rule matters, instead of constant folding rule. Could you clarify?

Copy link
Contributor Author

@gaoyangxiaozhu gaoyangxiaozhu Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey good question @PHILO-HE , actually the matter is constant folding rule which should be removed from the excludedRules, so here is only use to override and remove it from default test sql conf and keep others https://github.com/apache/incubator-gluten/blob/main/backends-velox/src/test/scala/org/apache/gluten/execution/FunctionsValidateTest.scala#L44

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With constant folding rule applied, will the physical plan delivered to Gluten become an Alias plan (the result is actually computed by Spark, and Gluten just offloads the Alias plan)? Could you please verify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @PHILO-HE, maybe you mis-understand here, I just removed constant folding rule from excludedRules to make sure the rule can take effect (the default spark behavior), so the result of the UT is computed actually by spark and been put into unsafeArray with Alias Literal type to offload to gluten.

if put constant folding rule in excludedRules, it will try offload sequence to velox and fallback due to unsupporetd function name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaoyangxiaozhu, I see. I think it would be better to revise test name to the one like: "sequence function optimized by Spark constant folding"

@gaoyangxiaozhu gaoyangxiaozhu requested a review from PHILO-HE July 15, 2024 01:28
PHILO-HE
PHILO-HE previously approved these changes Jul 15, 2024
@gaoyangxiaozhu
Copy link
Contributor Author

@PHILO-HE could you help merge ?

@PHILO-HE
Copy link
Contributor

@PHILO-HE could you help merge ?

@gaoyangxiaozhu, do you think this small change suggestion makes sense?
#6433 (comment)

@@ -215,6 +217,19 @@ public static LiteralNode makeLiteral(Object obj, TypeNode typeNode) {

public static LiteralNode makeLiteral(Object obj, DataType dataType, Boolean nullable) {
TypeNode typeNode = ConverterUtils.getTypeNode(dataType, nullable);
if (obj instanceof UnsafeArrayData) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaoyangxiaozhu, can we just overload makeLiteral with UnsafeArrayData type as input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

em.. actually not sure, unless we sure the literal list type always be UnSafeArrayData, but i am not sure right now.
Let's keep current change , and let me do follow up to check spark code to see if can always move to UnSafeArrayData

@gaoyangxiaozhu
Copy link
Contributor Author

@PHILO-HE could you help merge ?

@gaoyangxiaozhu, do you think this small change suggestion makes sense? #6433 (comment)

updated

Copy link

Run Gluten Clickhouse CI

@PHILO-HE
Copy link
Contributor

@gaoyangxiaozhu, please also check this comment: https://github.com/apache/incubator-gluten/pull/6433/files#r1678722614

@gaoyangxiaozhu
Copy link
Contributor Author

@gaoyangxiaozhu, please also check this comment: https://github.com/apache/incubator-gluten/pull/6433/files#r1678722614

hey @PHILO-HE , please check my above reply. :)

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice!

@PHILO-HE PHILO-HE changed the title Fix spark sequence function fallback issue if input is literal array data [CORE] Fix fallback for spark sequence function with literal array data as input Jul 16, 2024
@PHILO-HE PHILO-HE merged commit dbbb4a7 into apache:main Jul 16, 2024
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants