-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-3620][VL] Support Range operator for Velox Backend #8161
base: main
Are you sure you want to change the base?
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI on x86 |
1 similar comment
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does velox currently have any similar operator that we can leverage or a new implementation will be needed in case we want to offload it.
import org.apache.spark.sql.GlutenSQLTestsTrait | ||
import org.apache.spark.sql.Row | ||
|
||
class GlutenSQLRangeExecSuite extends GlutenSQLTestsTrait { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suite would need to be enabled in VeloxTestSettings.
Yes, I'm planning to work on the offload implementation. Currently it seems unlikely we will have to introduce code in velox. Will raise PR once possible thanks. |
* @param child | ||
* Child SparkPlan nodes for this operator, if any. | ||
*/ | ||
case class RangeExecTransformer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we name it "ColumnarRangeExec" (or VeloxColumnarRangeExec
or something) as it doesn't extend the TransformSupport
trait? Thanks.
override protected def doValidateInternal(): ValidationResult = { | ||
val isSupported = BackendsApiManager.getSettings.supportRangeExec() | ||
|
||
if (!isSupported) { | ||
return ValidationResult.failed( | ||
s"RangeExec is not supported by the current backend." | ||
) | ||
} | ||
ValidationResult.succeeded | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think usually we have to add some code to this validator to make doValidate
be used. Can you help check this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am opening #8177 that may simplify the usage of validators. Perhaps could do a rebase once that PR merged.
current += numRows * step | ||
|
||
val batch = new ColumnarBatch(vectors.asInstanceOf[Array[ColumnVector]], numRows) | ||
val offloadedBatch = ColumnarBatches.offload(allocator, batch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the code, the operator's behaviour acutally align more precisely with
override def batchType(): Convention.BatchType = {
ArrowNativeBatch
}
(as ColumnarBatches.offload
results in native Arrow batch)
Would you see if we can add the code to this class? If yes we can also remove RangeExecBaseTransformer
's default implementation.
Doing this will make query planner to insert an explicit ArrowToVelox
c2c transition into query plan so we can easily collect the transition's metrics.
Futhermore, I will suggest remove line 117 and line 118 then return the batch
directly, then have the batch type changed as
override def batchType(): Convention.BatchType = {
ArrowJavaBatch
}
So two explicit c2cs (ArrowJava-to-ArrowNative, ArrowNative-to-Velox) can be inserted to query plan.
More details please refer to test code.
import org.apache.spark.sql.GlutenSQLTestsTrait | ||
import org.apache.spark.sql.Row | ||
|
||
class GlutenSQLRangeExecSuite extends GlutenSQLTestsTrait { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need test suites for all Spark versions?
We have some early discussion on this to map it to |
@ArnavBalyan would you please check the failed unit tests? thanks, -yuan |
Hi @zhouyuan thanks for the review, interesting thread, I will go through! I'll fix the tests today/tomorrow thanks! |
What changes were proposed in this pull request?
How was this patch tested?