-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL] Fix type mismatch issue for typed imperative aggregate #2669
[VL] Fix type mismatch issue for typed imperative aggregate #2669
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/oap-project/gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
dd21dc8
to
3c626bb
Compare
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
@PHILO-HE can you please do a rebase? thanks, -yuan |
b6c0cea
to
b650d69
Compare
Run Gluten Clickhouse CI |
Is there any new progress on this PR? |
Run Gluten Clickhouse CI |
c27d422
to
a4f5eee
Compare
Run Gluten Clickhouse CI |
a4f5eee
to
4e314dd
Compare
Run Gluten Clickhouse CI |
4e314dd
to
46ba2ae
Compare
Run Gluten Clickhouse CI |
@PHILO-HE can you please help to update the status in PR desc? I think below issue is fixed, however three are some other issues.
-yuan |
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
This PR was auto-closed because it has been stalled for 10 days with no activity. Please feel free to reopen if it is still valid. Thanks. |
What changes were proposed in this pull request?
For some typed imperative aggregate functions, like
collect_list
/collect_set
, spark will make agg buffer stored asBinaryType
inpartial
/partial merge
phase, even though these two functions' output isArrayType
. Gluten has no such special handling. So for velox backend, agg function's raw data type is used for agg buffer inpartial
/partial merge
. Thus, there will be a type mismatch issue in the latter shuffle operator, as gluten plan inherits attributes from its corresponding spark plan. Forcollect_list
/collect_set
, shuffle expectsBinaryType
, but getsArrayType
.This patch includes the code changes made by Jian, Ma.
The current gap:
Spark's collect_list/collect_set dismisses null input, but velox doesn't.We have fixed the null incompatible issue in personal velox branch (not merged).
ObjectHashAggregate FINAL phase falls back (due to unsupported post project), but PARTIAL offloaded, the incompatibility can cause runtime issue.
How was this patch tested?
Imported spark UTs.