-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark column chunks in a PQ reader pass
as large strings when the cumulative offsets
exceeds the large strings threshold.
#17207
Conversation
|
||
int constexpr multiplier = 12; | ||
std::vector<cudf::column_view> input_cols(multiplier, input->view()); | ||
auto col0 = cudf::concatenate(input_cols); ///< 2.70GB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same column from GTest: CaseTest.ToLower
pass
as large strings when the cumulative offsets
exceeds the large strings threshold.
pass
as large strings when the cumulative offsets
exceeds the large strings threshold.pass
as large strings when the cumulative offsets
exceeds the large strings threshold.
Co-authored-by: David Wendt <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments; the core of the change looks good.
Co-authored-by: Vukasin Milovanovic <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for iterating on this!
/merge |
Description
This PR implements a method to correctly set the large-string property for column chunks in a in the Chunked Parquet Reader subpass if the cumulative string offsets have exceeded the large strings threshold.
Fixes #17158
Checklist