-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-34698: [C++][Acero] Add node that emits explicit ordering after asserting order #44738
base: main
Are you sure you want to change the base?
Conversation
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format?
or
In the case of PARQUET issues on JIRA the title also supports:
See also: |
e66878f
to
937695d
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am probably not qualified to review this thoroughly but the CI failure is related:
[ FAILED ] TestScannerThreading/TestScanner.ScanOrderingFail/0Serial1d1b1024r, where GetParam() = serial-1d-1b-1024i (4 ms)
[ RUN ] TestScannerThreading/TestScanner.ScanOrderingFail/1Serial2d16b1024r
/arrow/cpp/src/arrow/dataset/scanner_test.cc:1040: Failure
Expected equality of these values:
next.status().message()
Which is: "Data is not ordered\n/arrow/cpp/src/arrow/acero/assert_order_node.cc:190 AssertInBatchOrder(*record_batch, ordering_, comparators_)\n/arrow/cpp/src/arrow/acero/source_node.cc:158 output_->InputReceived(this, std::move(batch))"
"Data is not ordered"
With diff:
@@ -1,3 @@
Data is not ordered
-/arrow/cpp/src/arrow/acero/assert_order_node.cc:190 AssertInBatchOrder(*record_batch, ordering_, comparators_)
-/arrow/cpp/src/arrow/acero/source_node.cc:158 output_->InputReceived(this, std::move(batch))
There seems to be some stray output on that specific job.
7750094
to
2e30308
Compare
@raulcd thanks, assertions adjusted |
2e30308
to
af38189
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Conceptually this makes sense to me. I like that it verifies the ordering and it's good use of a sequencing queue. I can't speak much to the code in compare.h
/ compare.cc
as I'm not very familiar with it.
It's a brand new node and I don't see much change to existing paths so I'm not too concerned about a regression.
cpp/src/arrow/compare.cc
Outdated
Visit(const T& left_) { | ||
Visit(const T& left) { | ||
const auto& right = checked_cast<const T&>(right_); | ||
result_ = right.value == left_.value; | ||
result_ = left.value == right.value; | ||
return Status::OK(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change related to the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
Ordering ordering = assert_options.ordering; | ||
|
||
// check output ordering | ||
if (ordering.is_implicit() || ordering.is_unordered()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you use !ordering.is_explicit()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, fixed
@pitrou @lidavidm @bkietz @zeroshade you seem to be familiar with compare.cc, can you take a look, please? |
…De]sc assertions robust
Co-authored-by: Weston Pace <[email protected]>
9aed89f
to
59b4afa
Compare
Rationale for this change
An acero node that turns an implicit ordering into an explicit ordering (rows sorted by some columns) is useful to re-use order that already exists in the data.
What changes are included in this PR?
This PR adds the
AssertOrderNode
that implements this logic. TheScanner
employs that node to turn the implicit ordering of theScanNode
into an explicit order as defined by user code viaScanBuilder.Ordering
.Are these changes tested?
There are unit tests for the
AssertOrderNode
as well as for theScanNode
andScanBuilder
.Are there any user-facing changes?
The following options has been added:
ordering
option added toScanOptions
Ordering
option added toScannerBuilder
AssertNodeOptions
class