Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data-size member to libcudf column_view classes #14031

Closed

Conversation

davidwendt
Copy link
Contributor

Description

Adds new data-size member to the cudf::column_view, cudf::mutable_column_view, and cudf::column_device_view classes. This is working towards solving #13733 where the character column data may exceed a size_type. The size of the device buffer usually known since the data is stored in a rmm::device_buffer in the cudf::column class. This makes the size a requirement for building any of the various column_views though most are created through utilities. The size will only initially be needed when manipulating strings per #13733.

The size_bytes() name was chosen for consistency with string_view and device_span.

Depends on #14030

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 1, 2023
@davidwendt davidwendt self-assigned this Sep 1, 2023
@github-actions github-actions bot added the Python Affects Python cuDF API. label Sep 1, 2023
@@ -159,6 +159,13 @@ class column {
*/
[[nodiscard]] size_type size() const noexcept { return _size; }

/**
* @brief Returns the number of data bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be clearer. Is it the number of bytes of a single element? size() * sizeof(T)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be size() * sizeof(T) but only where T is a fixed-width-type.
The intention here is to expose the actual device-buffer size() which may be larger than sizeof(size_type) in the solutions we are looking at ways to allow the total number of bytes in a strings column to exceed size_type.

*
* @return The number of bytes
*/
[[nodiscard]] std::size_t size_bytes() const noexcept;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a fundamental invariant is that column.size() * size_of(column.type() == column.size_bytes(). However, given that column::size() returns int, then that invariant can be violated which seems bad.

Are we planning to make column::size() also return size_t?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not changing size() but rather only adding size_bytes().

I think the invariant argument may break down with compounds types. We are only trying to solve for strings columns at this point.

@davidwendt davidwendt closed this Sep 13, 2023
@davidwendt davidwendt deleted the column-size-bytes branch September 18, 2023 12:56
@karthikeyann karthikeyann self-requested a review September 24, 2023 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants