[GLUTEN-3620][VL] Support Range operator for Velox Backend #8161

ArnavBalyan · 2024-12-05T17:09:13Z

What changes were proposed in this pull request?

Currently the range operator fallsback leads to R2C overhead for downstream operators in velox.
Added support for RangeExec which produces range in columnar batches.
In the future the range generation can itself be offloaded to velox backend.

How was this patch tested?

Plan reflects the new operator
Added unit tests

github-actions · 2024-12-05T17:09:31Z

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Other pull requests

github-actions · 2024-12-05T17:09:47Z

Run Gluten Clickhouse CI on x86

github-actions · 2024-12-05T17:10:08Z

Run Gluten Clickhouse CI on x86

github-actions · 2024-12-05T17:11:27Z

Run Gluten Clickhouse CI on x86

CodenameGHOST007

Does velox currently have any similar operator that we can leverage or a new implementation will be needed in case we want to offload it.

CodenameGHOST007 · 2024-12-06T04:52:44Z

gluten-ut/spark33/src/test/scala/org/apache/spark/sql/execution/GlutenSQLRangeExecSuite.scala

+import org.apache.spark.sql.GlutenSQLTestsTrait
+import org.apache.spark.sql.Row
+
+class GlutenSQLRangeExecSuite extends GlutenSQLTestsTrait {


The suite would need to be enabled in VeloxTestSettings.

ArnavBalyan · 2024-12-06T05:22:11Z

Does velox currently have any similar operator that we can leverage or a new implementation will be needed in case we want to offload it.

Yes, I'm planning to work on the offload implementation. Currently it seems unlikely we will have to introduce code in velox. Will raise PR once possible thanks.
Some minor changes are needed for the tests, will enable them thanks for review

zhztheplayer · 2024-12-06T11:05:39Z

backends-velox/src/main/scala/org/apache/gluten/execution/RangeExecTransformer.scala

+ * @param child
+ *   Child SparkPlan nodes for this operator, if any.
+ */
+case class RangeExecTransformer(


Can we name it "ColumnarRangeExec" (or VeloxColumnarRangeExec or something) as it doesn't extend the TransformSupport trait? Thanks.

zhztheplayer · 2024-12-06T11:08:46Z

gluten-substrait/src/main/scala/org/apache/gluten/execution/RangeExecBaseTransformer.scala

+  override protected def doValidateInternal(): ValidationResult = {
+    val isSupported = BackendsApiManager.getSettings.supportRangeExec()
+
+    if (!isSupported) {
+      return ValidationResult.failed(
+        s"RangeExec is not supported by the current backend."
+      )
+    }
+    ValidationResult.succeeded
+  }


I think usually we have to add some code to this validator to make doValidate be used. Can you help check this?

I am opening #8177 that may simplify the usage of validators. Perhaps could do a rebase once that PR merged.

zhztheplayer · 2024-12-06T11:18:33Z

backends-velox/src/main/scala/org/apache/gluten/execution/RangeExecTransformer.scala

+                current += numRows * step
+
+                val batch = new ColumnarBatch(vectors.asInstanceOf[Array[ColumnVector]], numRows)
+                val offloadedBatch = ColumnarBatches.offload(allocator, batch)


Because of the code, the operator's behaviour acutally align more precisely with

override def batchType(): Convention.BatchType = { ArrowNativeBatch }

(as ColumnarBatches.offload results in native Arrow batch)

Would you see if we can add the code to this class? If yes we can also remove RangeExecBaseTransformer's default implementation.

Doing this will make query planner to insert an explicit ArrowToVelox c2c transition into query plan so we can easily collect the transition's metrics.

Futhermore, I will suggest remove line 117 and line 118 then return the batch directly, then have the batch type changed as

override def batchType(): Convention.BatchType = { ArrowJavaBatch }

So two explicit c2cs (ArrowJava-to-ArrowNative, ArrowNative-to-Velox) can be inserted to query plan.

More details please refer to test code.

zhztheplayer · 2024-12-06T12:28:12Z

gluten-ut/spark32/src/test/scala/org/apache/spark/sql/execution/GlutenSQLRangeExecSuite.scala

+import org.apache.spark.sql.GlutenSQLTestsTrait
+import org.apache.spark.sql.Row
+
+class GlutenSQLRangeExecSuite extends GlutenSQLTestsTrait {


Why we need test suites for all Spark versions?

github-actions · 2024-12-11T11:18:53Z

#3620

zhouyuan · 2024-12-11T11:21:44Z

Does velox currently have any similar operator that we can leverage or a new implementation will be needed in case we want to offload it.

Yes, I'm planning to work on the offload implementation. Currently it seems unlikely we will have to introduce code in velox. Will raise PR once possible thanks. Some minor changes are needed for the tests, will enable them thanks for review

We have some early discussion on this to map it to unnest + sequence - but didn't try the real impl at that time
#3620

zhouyuan · 2024-12-13T00:28:48Z

@ArnavBalyan would you please check the failed unit tests?

thanks, -yuan

ArnavBalyan · 2024-12-16T05:38:39Z

Hi @zhouyuan thanks for the review, interesting thread, I will go through! I'll fix the tests today/tomorrow thanks!

ArnavBalyan added 2 commits December 5, 2024 10:34

update

56d532c

update

b305a88

github-actions bot added CORE works for Gluten Core BUILD VELOX labels Dec 5, 2024

update

c2122a1

updat

e30bbcf

CodenameGHOST007 reviewed Dec 6, 2024

View reviewed changes

zhztheplayer reviewed Dec 9, 2024

View reviewed changes

zhouyuan changed the title ~~[VL] Support Range operator for Velox Backend~~ [GLUTEN-3620][VL] Support Range operator for Velox Backend Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GLUTEN-3620][VL] Support Range operator for Velox Backend #8161

[GLUTEN-3620][VL] Support Range operator for Velox Backend #8161

ArnavBalyan commented Dec 5, 2024

github-actions bot commented Dec 5, 2024

github-actions bot commented Dec 5, 2024

github-actions bot commented Dec 5, 2024

github-actions bot commented Dec 5, 2024

CodenameGHOST007 left a comment

CodenameGHOST007 Dec 6, 2024

ArnavBalyan commented Dec 6, 2024

zhztheplayer Dec 6, 2024 •

edited

Loading

zhztheplayer Dec 6, 2024 •

edited

Loading

zhztheplayer Dec 9, 2024

zhztheplayer Dec 6, 2024 •

edited

Loading

zhztheplayer Dec 6, 2024

github-actions bot commented Dec 11, 2024

zhouyuan commented Dec 11, 2024

zhouyuan commented Dec 13, 2024

ArnavBalyan commented Dec 16, 2024

[GLUTEN-3620][VL] Support Range operator for Velox Backend #8161

Are you sure you want to change the base?

[GLUTEN-3620][VL] Support Range operator for Velox Backend #8161

Conversation

ArnavBalyan commented Dec 5, 2024

What changes were proposed in this pull request?

How was this patch tested?

github-actions bot commented Dec 5, 2024

github-actions bot commented Dec 5, 2024

github-actions bot commented Dec 5, 2024

github-actions bot commented Dec 5, 2024

CodenameGHOST007 left a comment

Choose a reason for hiding this comment

CodenameGHOST007 Dec 6, 2024

Choose a reason for hiding this comment

ArnavBalyan commented Dec 6, 2024

zhztheplayer Dec 6, 2024 • edited Loading

Choose a reason for hiding this comment

zhztheplayer Dec 6, 2024 • edited Loading

Choose a reason for hiding this comment

zhztheplayer Dec 9, 2024

Choose a reason for hiding this comment

zhztheplayer Dec 6, 2024 • edited Loading

Choose a reason for hiding this comment

zhztheplayer Dec 6, 2024

Choose a reason for hiding this comment

github-actions bot commented Dec 11, 2024

zhouyuan commented Dec 11, 2024

zhouyuan commented Dec 13, 2024

ArnavBalyan commented Dec 16, 2024

zhztheplayer Dec 6, 2024 •

edited

Loading

zhztheplayer Dec 6, 2024 •

edited

Loading

zhztheplayer Dec 6, 2024 •

edited

Loading