Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Add support of percentile_approx / approx_percentile #4889

Open
zhztheplayer opened this issue Mar 8, 2024 · 6 comments
Open

[VL] Add support of percentile_approx / approx_percentile #4889

zhztheplayer opened this issue Mar 8, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@zhztheplayer
Copy link
Member

Description

Part of #4039

Velox(presto)'s approx_percentile has different intermediate types with Spark's function with same name.

Velox's signature code of approx_percentile :

void addSignatures(
    const std::string& inputType,
    const std::string& percentileType,
    const std::string& returnType,
    std::vector<std::shared_ptr<exec::AggregateFunctionSignature>>&
        signatures) {
  auto intermediateType = fmt::format(
      "row(array(double), boolean, double, integer, bigint, {0}, {0}, array({0}), array(integer))",
      inputType);
  signatures.push_back(exec::AggregateFunctionSignatureBuilder()
                           .returnType(returnType)
                           .intermediateType(intermediateType)
                           .argumentType(inputType)
                           .argumentType(percentileType)
                           .build());
  signatures.push_back(exec::AggregateFunctionSignatureBuilder()
                           .returnType(returnType)
                           .intermediateType(intermediateType)
                           .argumentType(inputType)
                           .argumentType("bigint")
                           .argumentType(percentileType)
                           .build());
  signatures.push_back(exec::AggregateFunctionSignatureBuilder()
                           .returnType(returnType)
                           .intermediateType(intermediateType)
                           .argumentType(inputType)
                           .argumentType(percentileType)
                           .argumentType("double")
                           .build());
  signatures.push_back(exec::AggregateFunctionSignatureBuilder()
                           .returnType(returnType)
                           .intermediateType(intermediateType)
                           .argumentType(inputType)
                           .argumentType("bigint")
                           .argumentType(percentileType)
                           .argumentType("double")
                           .build());
}

Link https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/aggregates/ApproxPercentileAggregate.cpp#L790C1-L828C1

Spark's code of approx_percentile / percentile_approx:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala

@zhztheplayer zhztheplayer added the enhancement New feature or request label Mar 8, 2024
@zhztheplayer
Copy link
Member Author

zhztheplayer commented Mar 8, 2024

cc @liujiayi771 @ulysses-you I took the function in #4039 but may delay the implementation. I see you have been actively working on aggregation functions in past a few of days so (you or anyone) please feel free to take this issue if interested.

It looks like the new utility introduced by #4764 could help a lot here but may still require for some changes though. I am not sure and you may have more context in there than me.

@ulysses-you
Copy link
Contributor

Thank you @zhztheplayer for driving this, I think @liujiayi771 is the better one to take it :)

@liujiayi771
Copy link
Contributor

Thank you @zhztheplayer for driving this, I think @liujiayi771 is the better one to take it :)

@zhztheplayer I can come to add these aggregate functions, thanks.

@zhztheplayer
Copy link
Member Author

Thank you @zhztheplayer for driving this, I think @liujiayi771 is the better one to take it :)

@zhztheplayer I can come to add these aggregate functions, thanks.

@liujiayi771 Glad to hear you'll help. I'll reassign the task to you. Thanks!

@WangGuangxin
Copy link
Contributor

I have a draft version before, and I can take over this if you have not start yet @liujiayi771 @zhztheplayer

@liujiayi771
Copy link
Contributor

@WangGuangxin I have not started yet, feel free to take it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants