Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segfault in conditional join #16094

Merged
merged 4 commits into from
Jun 26, 2024

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Jun 25, 2024

Description

Closes #16066.

I found a bug that would cause the reported segfault and have fixed it in this PR. When the right table has zero rows, conditional left anti-joins were returning a vector of indices containing garbage data.

Along the way, I refactored several parts of the conditional join tests and added coverage for more cases.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@bdice bdice added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jun 25, 2024
@bdice bdice self-assigned this Jun 25, 2024
@bdice bdice requested a review from a team as a code owner June 25, 2024 22:15
@bdice bdice requested review from ttnghia and PointKernel June 25, 2024 22:15
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jun 25, 2024
Comment on lines 52 to 53
return get_trivial_left_join_indices(left, stream, rmm::mr::get_current_device_resource())
.first;
Copy link
Contributor Author

@bdice bdice Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was returning an uninitialized vector, but it should be initialized with [0, 1, 2, ..., left.num_rows() - 1]. I made it match the mixed join code, which calls this utility function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't the output generated using the provided mr but default mr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed in 1a0f66e.

Comment on lines -99 to -101
if (left.num_rows() == 0) {
return std::make_unique<rmm::device_uvector<size_type>>(0, stream, mr);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code path is unreachable. The case of zero rows is handled earlier in the function.

case join_kind::LEFT_ANTI_JOIN:
return get_trivial_left_join_indices(left, stream, rmm::mr::get_current_device_resource())
.first;
case join_kind::LEFT_ANTI_JOIN: return get_trivial_left_join_indices(left, stream, mr).first;
Copy link
Contributor

@ttnghia ttnghia Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humnn, this seems wrong to me, since we don't return rvalue thus *.first seems to be returned by copied. However, *.first here actually is a unique_ptr thus it can't be copied. I'm surprised that this code can compile. The compiler automatically moves *.first for us?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I follow. I copied this from here:

return get_trivial_left_join_indices(
left_conditional, stream, rmm::mr::get_current_device_resource())
.first;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the C++ syntax looks like it is wrong but it isn't (because it can compile) so I'm confused...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return may be ellided https://en.cppreference.com/w/cpp/language/copy_elision -- return value optimization (RVO) by the compiler. If the compiler cannot do this it will report an error since you cannot copy a unique-ptr. So no error means RVO was achieved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the compiler automatically optimizes the case of unique_ptr for us. I did some experiments here: https://godbolt.org/z/zs3TsW1do

So if we don't use unique_ptr, the returned object is copied. When put in unique_ptr wrapper, there is no copy at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came up in another PR recently. As David mentioned, the RVO is technically optional for the compiler, but under the circumstances the only other option is a compiler error because the type isn't copyable, so it's not really a problem to rely on this behavior because we'll immediately know if it doesn't do what we want if the code doesn't compile.

Copy link
Contributor

@ttnghia ttnghia Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is good in general. However, correct me if I'm wrong, the situations under which RVO is performed are not fully defined by C++ standards (mandatory for temporaries but only optional for named objects). In other words, it can compile this time but may fail to compile in the future version of GCC if that version changes its RVO behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct that it is optional. The worst case outcome is an easy-to-diagnose failure to compile rather than a successful compilation with incorrect (or undefined) behavior, so I don't think any action is strictly necessary. I'd certainly be open to being more explicit, but as David pointed out in #15645 (comment) we use this pattern in many places in libcudf (as evidenced by the fact that Bradley copy-pasted it), so if we were to make any changes we would want to do so systematically.

auto result = this->join(left, right, predicate);
auto lhs_result = cudf::detail::make_std_vector_sync(*result.first, cudf::get_default_stream());
auto rhs_result =
cudf::detail::make_std_vector_sync(*result.second, cudf::get_default_stream());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning up my gross element-wise test!

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Thanks for the prompt fix! 🔥

@bdice
Copy link
Contributor Author

bdice commented Jun 26, 2024

Thanks for the reviews, all!

@bdice
Copy link
Contributor Author

bdice commented Jun 26, 2024

/merge

@rapids-bot rapids-bot bot merged commit 65b64f6 into rapidsai:branch-24.08 Jun 26, 2024
74 checks passed
bdice added a commit to bdice/cudf that referenced this pull request Jun 26, 2024
Closes rapidsai#16066.

I found a bug that would cause the reported segfault and have fixed it in this PR. When the right table has zero rows, conditional left anti-joins were returning a vector of indices containing garbage data.

Along the way, I refactored several parts of the conditional join tests and added coverage for more cases.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Yunsong Wang (https://github.com/PointKernel)

URL: rapidsai#16094
raydouglass pushed a commit that referenced this pull request Jul 1, 2024
Backports #16094 to 24.06 for inclusion in a hotfix release.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Segmentation fault in cudf::conditional_inner_join with Zero-Row Input
5 participants