-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE] Consider the cost when applying stage fallback policy #3569
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/oap-project/gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
Hi @ulysses-you, motivated by your previous work, this PR is proposed to evaluate the cost when applying stage fallback policy. Could you please help review to see if it makes sense in practice? Thanks! |
8dfd6cf
to
b7af6de
Compare
Run Gluten Clickhouse CI |
b7af6de
to
df8f871
Compare
Run Gluten Clickhouse CI |
gluten-core/src/main/scala/io/glutenproject/extension/ExpandFallbackPolicy.scala
Outdated
Show resolved
Hide resolved
gluten-core/src/main/scala/io/glutenproject/extension/ExpandFallbackPolicy.scala
Show resolved
Hide resolved
@@ -175,11 +190,15 @@ case class ExpandFallbackPolicy(isAdaptiveContext: Boolean, originalPlan: SparkP | |||
return None | |||
} | |||
|
|||
val numFallback = countFallbacks(plan) | |||
if (numFallback >= fallbackThreshold) { | |||
val netFallbackNum = if (isAdaptiveContext) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the net
do you mean final ? Does final sound more friendly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, the net means "净值" to reflect stage fallback's benefit with cost deducted. I am afraid final
may look like the fallback num at last after policy is applied. Do you think so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for the explanation, I'm fine with net
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @PHILO-HE , looks good to me except one comment
.foreach { | ||
case stage: QueryStageExec | ||
if stage.plan.isInstanceOf[GlutenPlan] || | ||
InMemoryTableScanHelper.isGlutenTableCache(stage) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also check it at line 143
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comment! At line 143, we only find gluten plan whose one or more child is last stage. It seems this gluten plan cannot be gluten table cache since it's a leaf node without child. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just consider such case if the leaf node is Gluten TableCache, for example:
GlutenTableCache
|
HashAggregateTrasnformer
|
HashAggregateTrasnformer
|
ColumnarToRow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this example, as GlutenTableCache
cannot be connected to last stage, we don't need to consider two-stage's transition cost (ColumnarToRow) when falling back the current stage. Am I missing something?
Run Gluten Clickhouse CI |
37969e9
to
9fd398a
Compare
Run Gluten Clickhouse CI |
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
What changes were proposed in this pull request?
When making a stage fallback, an extra ColumnarToRow may be required to adapt to the columnar output of last query stage(s). So this cost may need to be considered to get a net fallback number.
How was this patch tested?
An existing UT covers it.