-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-4713][CORE] Fix invalid children caused by std::move in RowVectorStream #4753
Conversation
Run Gluten Clickhouse CI |
cc @zhouyuan, @rui-mo, @ulysses-you, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Verified on internal TPC-DS test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thank you
/Benchmark Velox |
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
What changes were proposed in this pull request?
In #4713, we discovered that executing q23b on large scale TPCDS dataset would result in a core dump. After investigation, we concluded that an optimization introduced in #4628, which removed some unnecessary post-projects, led to the occurrence of the core dump. However, we believe that this is only a symptom and not the root cause of the problem. In #4726, we restored the post-project and continued to investigate the reason behind the core dump that was triggered by the absence of the post-project.
We ultimately identified the root cause of the issue to be that the
next
method ofRowVectorStream
moves away the children ofRowVector
, leaving theRowVector
's children in an indeterminate state. Since the aggregation operation reuses the outputRowVector
, accessing a child that has been moved away during reuse can lead to a core dump.Why doesn't a core dump occur when there is a post-project?
FilterProject
operator in Velox does not reuse the outputRowVector
, it always creates a newRowVector
to return.Why doesn't the bug get triggered on small datasets?
The bug is only triggered when there is a reuse scenario occurring with multiple batches being outputted by the aggregation. With small data, the aggregation would only output a single batch, hence no issue arises. However, if the batch size is reduced, this bug can also be triggered.
Does the bug occur when the aggregation outputs multiple batches?
The bug is triggered when the downstream of the aggregation is a
RowVectorStream
operator.After the fix, we can revert the optimization of removing unnecessary post-projects introduced in #4628.
How was this patch tested?
Testing 1T TPCDS q23b.