-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE] Remove some backend-specific code from common module #3363
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/oap-project/gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
2 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
7e07917
to
ec3283c
Compare
Run Gluten Clickhouse CI |
4 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
18d94e0
to
7b62dd9
Compare
Run Gluten Clickhouse CI |
7b62dd9
to
20791de
Compare
Run Gluten Clickhouse CI |
3 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
fe22cf0
to
44b0788
Compare
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
/Benchmark Velox |
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
Run Gluten Clickhouse CI |
@zzcclp The patch include a couple of clean-ups against both two backends. Would you like to take a look? |
Run Gluten Clickhouse CI |
4de5903
to
0f940db
Compare
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
override def metricsApi(): MetricsApi = new MetricsHandler | ||
override def name(): String = VeloxBackend.BACKEND_NAME | ||
override def buildInfo(): GlutenPlugin.BackendBuildInfo = | ||
GlutenPlugin.BackendBuildInfo("Velox", VELOX_BRANCH, VELOX_REVISION, VELOX_REVISION_TIME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just use BACKEND_NAME
to replace Velox
for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me the name in BackendBuildInfo is just a declarative text than BACKEND_NAME which is used for lib loading. I think it's OK to use "Velox" "VELOX" or whatever letting user know it's the Velox backend here.
It might be feasible we have an individual const text "Velox" for printing information to end user, at the same time we rename "BACKEND_NAME" to "BACKEND_LIB_NAME" or something in another patch. Although it's just a trivial change which will not improve things so much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding some comments for BackendBuildInfo
to clarify it is only for declarative purpose, and should not be used in real code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding some comments for BackendBuildInfo to clarify it is only for declarative purpose, and should not be used in real code?
Agreed but do you think we can do that in another PR? Since it's a little bit off-topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the assumption that this PR is just some code movement across modules. It's far beyond its mission to improve all relevant code's quality. ; )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, makes sense.
@@ -64,7 +64,7 @@ object FallbackUtil extends Logging with AdaptiveSparkPlanHelper { | |||
} | |||
} | |||
|
|||
def isFallback(plan: SparkPlan): Boolean = { | |||
def hasFallbacks(plan: SparkPlan): Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe hasFallback
is more suitable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -73,13 +73,9 @@ class BatchScanExecTransformer( | |||
} | |||
|
|||
override def getInputFilePaths: Seq[String] = { | |||
if (BackendsApiManager.isVeloxBackend) { | |||
Seq.empty[String] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add Velox specific logic in backends-velox?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Velox backend doesn't use the output of the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's needed, relation.location.inputFiles.toSeq
is an expensive API, getInputFilePaths
's result is useless but this check is needed, as long as CH backend not implement more general partition parsing.
#2405
Run Gluten Clickhouse CI |
"true".equals(sparkSession.sparkContext.getLocalProperty("isNativeAppliable")) | ||
&& GlutenConfig.isCurrentBackendVelox && false | ||
) { | ||
// why if (false)? Such code requires comments when being written |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd better follow the same comment style by capitalizing the first letter and add .
at the end for all comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like the "BackendsApiManager.isxxxBackend" is removed, can gluten automatically pick the right code path?
@@ -69,7 +69,7 @@ trait CHFormatWriterInjects extends GlutenFormatWriterInjectsBase { | |||
sparkSession: SparkSession, | |||
options: Map[String, String], | |||
files: Seq[FileStatus]): Option[StructType] = { | |||
throw new UnsupportedOperationException("CHFormatWriterInjects does not support inferSchema") | |||
OrcUtils.inferSchema(sparkSession, files, options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zzcclp this will change the behavior on ORC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually doesn't change CH ORC's behavior
See code change in
(OrcFileFormat.scala)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@taiyang-li please check this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is ok
@@ -225,7 +225,7 @@ public static void checkDecimalScale(int scale) { | |||
} | |||
|
|||
public static ScalarFunctionNode makeScalarFunction( | |||
Long functionId, ArrayList<ExpressionNode> expressionNodes, TypeNode typeNode) { | |||
Long functionId, List<ExpressionNode> expressionNodes, TypeNode typeNode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a very unusual practice to define a variable with ArrayList
type. Should use the abstract form as long as the code works.
Backend code is supposed to talk with core module through backend-API. It's the main purpose for the patch to cleanup code to make sure it follows the standard way. So as a result he APIs were removed. |
Run Gluten Clickhouse CI |
@@ -53,12 +60,12 @@ object CHBackendSettings extends BackendSettingsApi with Logging { | |||
// experimental: when the files count per partition exceeds this threshold, | |||
// it will put the files into one partition. | |||
val GLUTEN_CLICKHOUSE_FILES_PER_PARTITION_THRESHOLD: String = | |||
GlutenConfig.GLUTEN_CONFIG_PREFIX + GlutenConfig.GLUTEN_CLICKHOUSE_BACKEND + | |||
GlutenConfig.GLUTEN_CONFIG_PREFIX + CHBackend.BACKEND_NAME + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to use "CHBackend.CONFIG_PREFIX" to replace "GlutenConfig.GLUTEN_CONFIG_PREFIX + CHBackend.BACKEND_NAME"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHBraodcastApi -> CHBroadcastApi
@@ -293,7 +293,7 @@ trait GlutenTestsTrait extends GlutenTestsCommonTrait { | |||
|
|||
def shouldNotFallback(): Unit = { | |||
TestStats.offloadGluten = false | |||
if (BackendsApiManager.getBackendName != GlutenConfig.GLUTEN_CLICKHOUSE_BACKEND) { | |||
if (BackendTestUtils.isCHBackendLoaded()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!BackendTestUtils.isCHBackendLoaded()) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching!
Run Gluten Clickhouse CI |
*/ | ||
package io.glutenproject.execution | ||
|
||
abstract class GlutenClickHouseWholeStageTransformerSuite extends WholeStageTransformerSuite {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why it needs to add this abstract class ? extending WholeStageTransformerSuite
directly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just keep it for storing test settings in future. Velox backend has VeloxWholeStageTransformerSuite
too after this patch.
Reviewers, since it's quite a few days since the PR was ready for reviewing, I am planning to merge it by end of today if there's no -1 comments. I will be prepared to actively fixing any possible issues brought by the changes afterwards so please feel free to ping me if you found some. Also please continue reviewing before merging. Thank you very much. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 a big refactor!
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
This patch will remove most of legacy code that was backend-specific from the common modules (
gluten-core
,gluten-data
,shims
).There will still remain some backend-conditional logics in test code which will be removed later.
We should avoid re-adding backend-specific code after the patch got merged as long as we have one alternative way.