Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Fix type mismatch issue for typed imperative aggregate #2669

Closed

Conversation

PHILO-HE
Copy link
Contributor

@PHILO-HE PHILO-HE commented Aug 9, 2023

What changes were proposed in this pull request?

For some typed imperative aggregate functions, like collect_list/ collect_set, spark will make agg buffer stored as BinaryType in partial/ partial merge phase, even though these two functions' output is ArrayType. Gluten has no such special handling. So for velox backend, agg function's raw data type is used for agg buffer in partial/partial merge. Thus, there will be a type mismatch issue in the latter shuffle operator, as gluten plan inherits attributes from its corresponding spark plan. For collect_list/ collect_set, shuffle expects BinaryType, but gets ArrayType.

This patch includes the code changes made by Jian, Ma.

The current gap:

  1. Spark's collect_list/collect_set dismisses null input, but velox doesn't.
    We have fixed the null incompatible issue in personal velox branch (not merged).
  2. A common issue for ObjectHashAggregate (including collect_set/collect_list, etc.):
    ObjectHashAggregate FINAL phase falls back (due to unsupported post project), but PARTIAL offloaded, the incompatibility can cause runtime issue.

How was this patch tested?

Imported spark UTs.

@github-actions
Copy link

github-actions bot commented Aug 9, 2023

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@github-actions
Copy link

github-actions bot commented Aug 9, 2023

Run Gluten Clickhouse CI

1 similar comment
@github-actions
Copy link

github-actions bot commented Aug 9, 2023

Run Gluten Clickhouse CI

@PHILO-HE PHILO-HE force-pushed the match-type-for-typed-imperative-agg branch from dd21dc8 to 3c626bb Compare August 9, 2023 13:07
@github-actions
Copy link

github-actions bot commented Aug 9, 2023

Run Gluten Clickhouse CI

1 similar comment
@github-actions
Copy link

github-actions bot commented Aug 9, 2023

Run Gluten Clickhouse CI

@zhouyuan
Copy link
Contributor

@PHILO-HE can you please do a rebase? thanks, -yuan

@PHILO-HE PHILO-HE force-pushed the match-type-for-typed-imperative-agg branch from b6c0cea to b650d69 Compare August 25, 2023 06:35
@github-actions
Copy link

Run Gluten Clickhouse CI

@liujiayi771
Copy link
Contributor

Is there any new progress on this PR?

@github-actions
Copy link

Run Gluten Clickhouse CI

@PHILO-HE PHILO-HE changed the base branch from main to 2023_10_19 October 27, 2023 01:23
@PHILO-HE PHILO-HE force-pushed the match-type-for-typed-imperative-agg branch from c27d422 to a4f5eee Compare October 27, 2023 01:41
@github-actions
Copy link

Run Gluten Clickhouse CI

@PHILO-HE PHILO-HE force-pushed the match-type-for-typed-imperative-agg branch from a4f5eee to 4e314dd Compare October 27, 2023 03:58
@PHILO-HE PHILO-HE changed the base branch from 2023_10_19 to 2023_10_12 October 27, 2023 03:59
@github-actions
Copy link

Run Gluten Clickhouse CI

@PHILO-HE PHILO-HE force-pushed the match-type-for-typed-imperative-agg branch from 4e314dd to 46ba2ae Compare October 27, 2023 04:00
@github-actions
Copy link

Run Gluten Clickhouse CI

@zhouyuan
Copy link
Contributor

zhouyuan commented Nov 3, 2023

@PHILO-HE can you please help to update the status in PR desc? I think below issue is fixed, however three are some other issues.

The current gap:
Spark's collect_list/collect_set dismisses null input, but velox doesn't.

-yuan

Copy link

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale stale label Dec 19, 2023
Copy link

This PR was auto-closed because it has been stalled for 10 days with no activity. Please feel free to reopen if it is still valid. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants