-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix segfault in conditional join #16094
Fix segfault in conditional join #16094
Conversation
cpp/src/join/conditional_join.cu
Outdated
return get_trivial_left_join_indices(left, stream, rmm::mr::get_current_device_resource()) | ||
.first; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was returning an uninitialized vector, but it should be initialized with [0, 1, 2, ..., left.num_rows() - 1]
. I made it match the mixed join code, which calls this utility function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't the output generated using the provided mr
but default mr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Fixed in 1a0f66e.
if (left.num_rows() == 0) { | ||
return std::make_unique<rmm::device_uvector<size_type>>(0, stream, mr); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code path is unreachable. The case of zero rows is handled earlier in the function.
case join_kind::LEFT_ANTI_JOIN: | ||
return get_trivial_left_join_indices(left, stream, rmm::mr::get_current_device_resource()) | ||
.first; | ||
case join_kind::LEFT_ANTI_JOIN: return get_trivial_left_join_indices(left, stream, mr).first; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humnn, this seems wrong to me, since we don't return rvalue
thus *.first
seems to be returned by copied. However, *.first
here actually is a unique_ptr
thus it can't be copied. I'm surprised that this code can compile. The compiler automatically moves *.first
for us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I follow. I copied this from here:
cudf/cpp/src/join/mixed_join_semi.cu
Lines 120 to 122 in cdfb550
return get_trivial_left_join_indices( | |
left_conditional, stream, rmm::mr::get_current_device_resource()) | |
.first; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the C++ syntax looks like it is wrong but it isn't (because it can compile) so I'm confused...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return may be ellided https://en.cppreference.com/w/cpp/language/copy_elision -- return value optimization (RVO) by the compiler. If the compiler cannot do this it will report an error since you cannot copy a unique-ptr. So no error means RVO was achieved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the compiler automatically optimizes the case of unique_ptr
for us. I did some experiments here: https://godbolt.org/z/zs3TsW1do
So if we don't use unique_ptr
, the returned object is copied. When put in unique_ptr
wrapper, there is no copy at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This came up in another PR recently. As David mentioned, the RVO is technically optional for the compiler, but under the circumstances the only other option is a compiler error because the type isn't copyable, so it's not really a problem to rely on this behavior because we'll immediately know if it doesn't do what we want if the code doesn't compile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior is good in general. However, correct me if I'm wrong, the situations under which RVO is performed are not fully defined by C++ standards (mandatory for temporaries but only optional for named objects). In other words, it can compile this time but may fail to compile in the future version of GCC if that version changes its RVO behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct that it is optional. The worst case outcome is an easy-to-diagnose failure to compile rather than a successful compilation with incorrect (or undefined) behavior, so I don't think any action is strictly necessary. I'd certainly be open to being more explicit, but as David pointed out in #15645 (comment) we use this pattern in many places in libcudf (as evidenced by the fact that Bradley copy-pasted it), so if we were to make any changes we would want to do so systematically.
auto result = this->join(left, right, predicate); | ||
auto lhs_result = cudf::detail::make_std_vector_sync(*result.first, cudf::get_default_stream()); | ||
auto rhs_result = | ||
cudf::detail::make_std_vector_sync(*result.second, cudf::get_default_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for cleaning up my gross element-wise test!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Thanks for the prompt fix! 🔥
Thanks for the reviews, all! |
/merge |
Closes rapidsai#16066. I found a bug that would cause the reported segfault and have fixed it in this PR. When the right table has zero rows, conditional left anti-joins were returning a vector of indices containing garbage data. Along the way, I refactored several parts of the conditional join tests and added coverage for more cases. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Nghia Truong (https://github.com/ttnghia) - Vyas Ramasubramani (https://github.com/vyasr) - Yunsong Wang (https://github.com/PointKernel) URL: rapidsai#16094
Description
Closes #16066.
I found a bug that would cause the reported segfault and have fixed it in this PR. When the right table has zero rows, conditional left anti-joins were returning a vector of indices containing garbage data.
Along the way, I refactored several parts of the conditional join tests and added coverage for more cases.
Checklist