-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix segfault in conditional join #16094
Merged
rapids-bot
merged 4 commits into
rapidsai:branch-24.08
from
bdice:fix/inner-join-zero-rows
Jun 26, 2024
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -48,8 +48,7 @@ std::unique_ptr<rmm::device_uvector<size_type>> conditional_join_anti_semi( | |
{ | ||
if (right.num_rows() == 0) { | ||
switch (join_type) { | ||
case join_kind::LEFT_ANTI_JOIN: | ||
return std::make_unique<rmm::device_uvector<size_type>>(left.num_rows(), stream, mr); | ||
case join_kind::LEFT_ANTI_JOIN: return get_trivial_left_join_indices(left, stream, mr).first; | ||
case join_kind::LEFT_SEMI_JOIN: | ||
return std::make_unique<rmm::device_uvector<size_type>>(0, stream, mr); | ||
default: CUDF_FAIL("Invalid join kind."); break; | ||
|
@@ -96,10 +95,6 @@ std::unique_ptr<rmm::device_uvector<size_type>> conditional_join_anti_semi( | |
join_size = size.value(stream); | ||
} | ||
|
||
if (left.num_rows() == 0) { | ||
return std::make_unique<rmm::device_uvector<size_type>>(0, stream, mr); | ||
} | ||
Comment on lines
-99
to
-101
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This code path is unreachable. The case of zero rows is handled earlier in the function. |
||
|
||
rmm::device_scalar<size_type> write_index(0, stream); | ||
|
||
auto left_indices = std::make_unique<rmm::device_uvector<size_type>>(join_size, stream, mr); | ||
|
@@ -149,8 +144,7 @@ conditional_join(table_view const& left, | |
// with a corresponding NULL from the right. | ||
case join_kind::LEFT_JOIN: | ||
case join_kind::LEFT_ANTI_JOIN: | ||
case join_kind::FULL_JOIN: | ||
return get_trivial_left_join_indices(left, stream, rmm::mr::get_current_device_resource()); | ||
case join_kind::FULL_JOIN: return get_trivial_left_join_indices(left, stream, mr); | ||
// Inner and left semi joins return empty output because no matches can exist. | ||
case join_kind::INNER_JOIN: | ||
case join_kind::LEFT_SEMI_JOIN: | ||
|
@@ -169,8 +163,7 @@ conditional_join(table_view const& left, | |
std::make_unique<rmm::device_uvector<size_type>>(0, stream, mr)); | ||
// Full joins need to return the trivial complement. | ||
case join_kind::FULL_JOIN: { | ||
auto ret_flipped = | ||
get_trivial_left_join_indices(right, stream, rmm::mr::get_current_device_resource()); | ||
auto ret_flipped = get_trivial_left_join_indices(right, stream, mr); | ||
return std::pair(std::move(ret_flipped.second), std::move(ret_flipped.first)); | ||
} | ||
default: CUDF_FAIL("Invalid join kind."); break; | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,6 +20,7 @@ | |
|
||
#include <cudf/ast/expressions.hpp> | ||
#include <cudf/column/column_view.hpp> | ||
#include <cudf/detail/utilities/vector_factories.hpp> | ||
#include <cudf/join.hpp> | ||
#include <cudf/table/table_view.hpp> | ||
#include <cudf/utilities/default_stream.hpp> | ||
|
@@ -222,21 +223,25 @@ struct ConditionalJoinPairReturnTest : public ConditionalJoinTest<T> { | |
std::vector<std::pair<cudf::size_type, cudf::size_type>> expected_outputs) | ||
{ | ||
auto result_size = this->join_size(left, right, predicate); | ||
EXPECT_TRUE(result_size == expected_outputs.size()); | ||
|
||
auto result = this->join(left, right, predicate); | ||
std::vector<std::pair<cudf::size_type, cudf::size_type>> result_pairs; | ||
for (size_t i = 0; i < result.first->size(); ++i) { | ||
// Note: Not trying to be terribly efficient here since these tests are | ||
// small, otherwise a batch copy to host before constructing the tuples | ||
// would be important. | ||
result_pairs.push_back({result.first->element(i, cudf::get_default_stream()), | ||
result.second->element(i, cudf::get_default_stream())}); | ||
} | ||
EXPECT_EQ(result_size, expected_outputs.size()); | ||
|
||
auto result = this->join(left, right, predicate); | ||
auto lhs_result = cudf::detail::make_std_vector_sync(*result.first, cudf::get_default_stream()); | ||
auto rhs_result = | ||
cudf::detail::make_std_vector_sync(*result.second, cudf::get_default_stream()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for cleaning up my gross element-wise test! |
||
std::vector<std::pair<cudf::size_type, cudf::size_type>> result_pairs(lhs_result.size()); | ||
std::transform(lhs_result.begin(), | ||
lhs_result.end(), | ||
rhs_result.begin(), | ||
result_pairs.begin(), | ||
[](cudf::size_type lhs, cudf::size_type rhs) { | ||
return std::pair{lhs, rhs}; | ||
}); | ||
std::sort(result_pairs.begin(), result_pairs.end()); | ||
std::sort(expected_outputs.begin(), expected_outputs.end()); | ||
|
||
EXPECT_TRUE(std::equal(expected_outputs.begin(), expected_outputs.end(), result_pairs.begin())); | ||
EXPECT_TRUE(std::equal( | ||
expected_outputs.begin(), expected_outputs.end(), result_pairs.begin(), result_pairs.end())); | ||
} | ||
|
||
/* | ||
|
@@ -411,6 +416,11 @@ TYPED_TEST(ConditionalInnerJoinTest, TestOneColumnLeftEmpty) | |
this->test({{}}, {{3, 4, 5}}, left_zero_eq_right_zero, {}); | ||
}; | ||
|
||
TYPED_TEST(ConditionalInnerJoinTest, TestOneColumnRightEmpty) | ||
{ | ||
this->test({{3, 4, 5}}, {{}}, left_zero_eq_right_zero, {}); | ||
}; | ||
|
||
TYPED_TEST(ConditionalInnerJoinTest, TestOneColumnTwoRowAllEqual) | ||
{ | ||
this->test({{0, 1}}, {{0, 0}}, left_zero_eq_right_zero, {{0, 0}, {0, 1}}); | ||
|
@@ -600,6 +610,14 @@ TYPED_TEST(ConditionalLeftJoinTest, TestOneColumnLeftEmpty) | |
this->test({{}}, {{3, 4, 5}}, left_zero_eq_right_zero, {}); | ||
}; | ||
|
||
TYPED_TEST(ConditionalLeftJoinTest, TestOneColumnRightEmpty) | ||
{ | ||
this->test({{3, 4, 5}}, | ||
{{}}, | ||
left_zero_eq_right_zero, | ||
{{0, JoinNoneValue}, {1, JoinNoneValue}, {2, JoinNoneValue}}); | ||
}; | ||
|
||
TYPED_TEST(ConditionalLeftJoinTest, TestCompareRandomToHash) | ||
{ | ||
auto [left, right] = gen_random_repeated_columns<TypeParam>(); | ||
|
@@ -666,6 +684,14 @@ TYPED_TEST(ConditionalFullJoinTest, TestOneColumnLeftEmpty) | |
{{JoinNoneValue, 0}, {JoinNoneValue, 1}, {JoinNoneValue, 2}}); | ||
}; | ||
|
||
TYPED_TEST(ConditionalFullJoinTest, TestOneColumnRightEmpty) | ||
{ | ||
this->test({{3, 4, 5}}, | ||
{{}}, | ||
left_zero_eq_right_zero, | ||
{{0, JoinNoneValue}, {1, JoinNoneValue}, {2, JoinNoneValue}}); | ||
}; | ||
|
||
TYPED_TEST(ConditionalFullJoinTest, TestTwoColumnThreeRowSomeEqual) | ||
{ | ||
this->test({{0, 1, 2}, {10, 20, 30}}, | ||
|
@@ -705,20 +731,16 @@ struct ConditionalJoinSingleReturnTest : public ConditionalJoinTest<T> { | |
auto [left_wrappers, right_wrappers, left_columns, right_columns, left, right] = | ||
this->parse_input(left_data, right_data); | ||
auto result_size = this->join_size(left, right, predicate); | ||
EXPECT_TRUE(result_size == expected_outputs.size()); | ||
|
||
auto result = this->join(left, right, predicate); | ||
std::vector<cudf::size_type> resulting_indices; | ||
for (size_t i = 0; i < result->size(); ++i) { | ||
// Note: Not trying to be terribly efficient here since these tests are | ||
// small, otherwise a batch copy to host before constructing the tuples | ||
// would be important. | ||
resulting_indices.push_back(result->element(i, cudf::get_default_stream())); | ||
} | ||
std::sort(resulting_indices.begin(), resulting_indices.end()); | ||
EXPECT_EQ(result_size, expected_outputs.size()); | ||
|
||
auto result = this->join(left, right, predicate); | ||
auto result_indices = cudf::detail::make_std_vector_sync(*result, cudf::get_default_stream()); | ||
std::sort(result_indices.begin(), result_indices.end()); | ||
std::sort(expected_outputs.begin(), expected_outputs.end()); | ||
EXPECT_TRUE( | ||
std::equal(resulting_indices.begin(), resulting_indices.end(), expected_outputs.begin())); | ||
EXPECT_TRUE(std::equal(result_indices.begin(), | ||
result_indices.end(), | ||
expected_outputs.begin(), | ||
expected_outputs.end())); | ||
} | ||
|
||
void _compare_to_hash_join(std::unique_ptr<rmm::device_uvector<cudf::size_type>> const& result, | ||
|
@@ -826,6 +848,16 @@ struct ConditionalLeftSemiJoinTest : public ConditionalJoinSingleReturnTest<T> { | |
|
||
TYPED_TEST_SUITE(ConditionalLeftSemiJoinTest, cudf::test::IntegralTypesNotBool); | ||
|
||
TYPED_TEST(ConditionalLeftSemiJoinTest, TestOneColumnLeftEmpty) | ||
{ | ||
this->test({{}}, {{3, 4, 5}}, left_zero_eq_right_zero, {}); | ||
}; | ||
|
||
TYPED_TEST(ConditionalLeftSemiJoinTest, TestOneColumnRightEmpty) | ||
{ | ||
this->test({{3, 4, 5}}, {{}}, left_zero_eq_right_zero, {}); | ||
}; | ||
|
||
TYPED_TEST(ConditionalLeftSemiJoinTest, TestTwoColumnThreeRowSomeEqual) | ||
{ | ||
this->test({{0, 1, 2}, {10, 20, 30}}, {{0, 1, 3}, {30, 40, 50}}, left_zero_eq_right_zero, {0, 1}); | ||
|
@@ -873,6 +905,16 @@ struct ConditionalLeftAntiJoinTest : public ConditionalJoinSingleReturnTest<T> { | |
|
||
TYPED_TEST_SUITE(ConditionalLeftAntiJoinTest, cudf::test::IntegralTypesNotBool); | ||
|
||
TYPED_TEST(ConditionalLeftAntiJoinTest, TestOneColumnLeftEmpty) | ||
{ | ||
this->test({{}}, {{3, 4, 5}}, left_zero_eq_right_zero, {}); | ||
}; | ||
|
||
TYPED_TEST(ConditionalLeftAntiJoinTest, TestOneColumnRightEmpty) | ||
{ | ||
this->test({{3, 4, 5}}, {{}}, left_zero_eq_right_zero, {0, 1, 2}); | ||
}; | ||
|
||
TYPED_TEST(ConditionalLeftAntiJoinTest, TestTwoColumnThreeRowSomeEqual) | ||
{ | ||
this->test({{0, 1, 2}, {10, 20, 30}}, {{0, 1, 3}, {30, 40, 50}}, left_zero_eq_right_zero, {2}); | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humnn, this seems wrong to me, since we don't return
rvalue
thus*.first
seems to be returned by copied. However,*.first
here actually is aunique_ptr
thus it can't be copied. I'm surprised that this code can compile. The compiler automatically moves*.first
for us?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I follow. I copied this from here:
cudf/cpp/src/join/mixed_join_semi.cu
Lines 120 to 122 in cdfb550
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the C++ syntax looks like it is wrong but it isn't (because it can compile) so I'm confused...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return may be ellided https://en.cppreference.com/w/cpp/language/copy_elision -- return value optimization (RVO) by the compiler. If the compiler cannot do this it will report an error since you cannot copy a unique-ptr. So no error means RVO was achieved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the compiler automatically optimizes the case of
unique_ptr
for us. I did some experiments here: https://godbolt.org/z/zs3TsW1doSo if we don't use
unique_ptr
, the returned object is copied. When put inunique_ptr
wrapper, there is no copy at all.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This came up in another PR recently. As David mentioned, the RVO is technically optional for the compiler, but under the circumstances the only other option is a compiler error because the type isn't copyable, so it's not really a problem to rely on this behavior because we'll immediately know if it doesn't do what we want if the code doesn't compile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior is good in general. However, correct me if I'm wrong, the situations under which RVO is performed are not fully defined by C++ standards (mandatory for temporaries but only optional for named objects). In other words, it can compile this time but may fail to compile in the future version of GCC if that version changes its RVO behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct that it is optional. The worst case outcome is an easy-to-diagnose failure to compile rather than a successful compilation with incorrect (or undefined) behavior, so I don't think any action is strictly necessary. I'd certainly be open to being more explicit, but as David pointed out in #15645 (comment) we use this pattern in many places in libcudf (as evidenced by the fact that Bradley copy-pasted it), so if we were to make any changes we would want to do so systematically.