-
Notifications
You must be signed in to change notification settings - Fork 911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add data-size member to libcudf column_view classes #14031
Conversation
@@ -159,6 +159,13 @@ class column { | |||
*/ | |||
[[nodiscard]] size_type size() const noexcept { return _size; } | |||
|
|||
/** | |||
* @brief Returns the number of data bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be clearer. Is it the number of bytes of a single element? size() * sizeof(T)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be size() * sizeof(T)
but only where T
is a fixed-width-type.
The intention here is to expose the actual device-buffer size()
which may be larger than sizeof(size_type)
in the solutions we are looking at ways to allow the total number of bytes in a strings column to exceed size_type
.
* | ||
* @return The number of bytes | ||
*/ | ||
[[nodiscard]] std::size_t size_bytes() const noexcept; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a fundamental invariant is that column.size() * size_of(column.type() == column.size_bytes()
. However, given that column::size()
returns int
, then that invariant can be violated which seems bad.
Are we planning to make column::size()
also return size_t
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not changing size()
but rather only adding size_bytes()
.
I think the invariant argument may break down with compounds types. We are only trying to solve for strings columns at this point.
Description
Adds new data-size member to the
cudf::column_view
,cudf::mutable_column_view
, andcudf::column_device_view
classes. This is working towards solving #13733 where the character column data may exceed asize_type
. The size of the device buffer usually known since thedata
is stored in armm::device_buffer
in thecudf::column
class. This makes the size a requirement for building any of the variouscolumn_view
s though most are created through utilities. The size will only initially be needed when manipulating strings per #13733.The
size_bytes()
name was chosen for consistency withstring_view
anddevice_span
.Depends on #14030
Checklist