Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-3620][VL] Support Range operator for Velox Backend #8161

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ArnavBalyan
Copy link
Contributor

What changes were proposed in this pull request?

  • Currently the range operator fallsback leads to R2C overhead for downstream operators in velox.
  • Added support for RangeExec which produces range in columnar batches.
  • In the future the range generation can itself be offloaded to velox backend.

How was this patch tested?

image
  • Plan reflects the new operator
  • Added unit tests

@github-actions github-actions bot added CORE works for Gluten Core BUILD VELOX labels Dec 5, 2024
Copy link

github-actions bot commented Dec 5, 2024

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

github-actions bot commented Dec 5, 2024

Run Gluten Clickhouse CI on x86

1 similar comment
Copy link

github-actions bot commented Dec 5, 2024

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Dec 5, 2024

Run Gluten Clickhouse CI on x86

Copy link
Contributor

@CodenameGHOST007 CodenameGHOST007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does velox currently have any similar operator that we can leverage or a new implementation will be needed in case we want to offload it.

import org.apache.spark.sql.GlutenSQLTestsTrait
import org.apache.spark.sql.Row

class GlutenSQLRangeExecSuite extends GlutenSQLTestsTrait {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suite would need to be enabled in VeloxTestSettings.

@ArnavBalyan
Copy link
Contributor Author

Does velox currently have any similar operator that we can leverage or a new implementation will be needed in case we want to offload it.

Yes, I'm planning to work on the offload implementation. Currently it seems unlikely we will have to introduce code in velox. Will raise PR once possible thanks.
Some minor changes are needed for the tests, will enable them thanks for review

* @param child
* Child SparkPlan nodes for this operator, if any.
*/
case class RangeExecTransformer(
Copy link
Member

@zhztheplayer zhztheplayer Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name it "ColumnarRangeExec" (or VeloxColumnarRangeExec or something) as it doesn't extend the TransformSupport trait? Thanks.

Comment on lines +45 to +54
override protected def doValidateInternal(): ValidationResult = {
val isSupported = BackendsApiManager.getSettings.supportRangeExec()

if (!isSupported) {
return ValidationResult.failed(
s"RangeExec is not supported by the current backend."
)
}
ValidationResult.succeeded
}
Copy link
Member

@zhztheplayer zhztheplayer Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think usually we have to add some code to this validator to make doValidate be used. Can you help check this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am opening #8177 that may simplify the usage of validators. Perhaps could do a rebase once that PR merged.

current += numRows * step

val batch = new ColumnarBatch(vectors.asInstanceOf[Array[ColumnVector]], numRows)
val offloadedBatch = ColumnarBatches.offload(allocator, batch)
Copy link
Member

@zhztheplayer zhztheplayer Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the code, the operator's behaviour acutally align more precisely with

  override def batchType(): Convention.BatchType = {
    ArrowNativeBatch
  }

(as ColumnarBatches.offload results in native Arrow batch)

Would you see if we can add the code to this class? If yes we can also remove RangeExecBaseTransformer's default implementation.

Doing this will make query planner to insert an explicit ArrowToVelox c2c transition into query plan so we can easily collect the transition's metrics.

Futhermore, I will suggest remove line 117 and line 118 then return the batch directly, then have the batch type changed as

  override def batchType(): Convention.BatchType = {
    ArrowJavaBatch
  }

So two explicit c2cs (ArrowJava-to-ArrowNative, ArrowNative-to-Velox) can be inserted to query plan.

More details please refer to test code.

import org.apache.spark.sql.GlutenSQLTestsTrait
import org.apache.spark.sql.Row

class GlutenSQLRangeExecSuite extends GlutenSQLTestsTrait {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need test suites for all Spark versions?

@zhouyuan zhouyuan changed the title [VL] Support Range operator for Velox Backend [GLUTEN-3620][VL] Support Range operator for Velox Backend Dec 11, 2024
Copy link

#3620

@zhouyuan
Copy link
Contributor

Does velox currently have any similar operator that we can leverage or a new implementation will be needed in case we want to offload it.

Yes, I'm planning to work on the offload implementation. Currently it seems unlikely we will have to introduce code in velox. Will raise PR once possible thanks. Some minor changes are needed for the tests, will enable them thanks for review

We have some early discussion on this to map it to unnest + sequence - but didn't try the real impl at that time
#3620

@zhouyuan
Copy link
Contributor

@ArnavBalyan would you please check the failed unit tests?

thanks, -yuan

@ArnavBalyan
Copy link
Contributor Author

Hi @zhouyuan thanks for the review, interesting thread, I will go through! I'll fix the tests today/tomorrow thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUILD CORE works for Gluten Core VELOX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants