Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add from_arrow_device function to cudf interop using nanoarrow #15458

Merged
merged 16 commits into from
Apr 23, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -361,6 +361,7 @@ add_library(
src/interop/from_arrow.cu
src/interop/to_arrow.cu
src/interop/to_arrow_device.cu
src/interop/from_arrow_device.cu
src/interop/detail/arrow_allocator.cpp
src/io/avro/avro.cpp
src/io/avro/avro_gpu.cu
Expand Down
63 changes: 63 additions & 0 deletions cpp/include/cudf/interop.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -284,5 +284,68 @@ std::unique_ptr<cudf::scalar> from_arrow(
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief typedef for a vector of owning columns, used for conversion from ArrowDeviceArray
*
*/
using owned_columns_t = std::vector<std::unique_ptr<cudf::column>>;

/**
* @brief functor for a custom deleter to a unique_ptr of table_view
*
* When converting from an ArrowDeviceArray, there are cases where data can't
* be zero-copy (i.e. bools or non-UINT32 dictionary indices). This custom deleter
* is used to maintain ownership over the data allocated since a `cudf::table_view`
* doesn't hold ownership.
*/
struct custom_view_deleter {
explicit custom_view_deleter(owned_columns_t&& owned) : owned_mem_{std::move(owned)} {}
void operator()(table_view* ptr) const { delete ptr; }
owned_columns_t owned_mem_;
};

/**
* @brief typedef for a unique_ptr to a `cudf::table_view` with custom deleter
*
*/
using unique_table_view_t = std::unique_ptr<cudf::table_view, custom_view_deleter>;

/**
* @brief Create `cudf::table_view` from given `ArrowDeviceArray` and `ArrowSchema`
*
* Constructs a non-owning `cudf::table_view` using `ArrowDeviceArray` and `ArrowSchema`,
* throwing an exception if the `device_type` of the `ArrowDeviceArray` is not ARROW_DEVICE_CUDA,
zeroshade marked this conversation as resolved.
Show resolved Hide resolved
* ARROW_DEVICE_CUDA_HOST or ARROW_DEVICE_CUDA_MANAGED, i.e. it must be accessible to CUDA.
* Because the resulting `cudf::table_view` will not own the data, the `ArrowDeviceArray`
* must be kept alive for the lifetime of the result. It is the responsibility of callers
* to ensure they call the release callback on the `ArrowDeviceArray` after it is no longer
* needed, and that the `cudf::table_view` is not accessed after this happens.
*
* If the type of the `ArrowSchema` / `ArrowDeviceArray` is a struct, then each of the
* children will be the columns of the resulting table_view. For all other types, a
* `cudf::table_view` will be returned with a single column representing the input.
*
* @note The custom deleter used for the unique_ptr to the table_view maintains ownership
* over any memory which is allocated, such as converting boolean columns from the bitmap
* used by Arrow to the 1-byte per value for cudf or casting dictionary indicies if they
* aren't already uint32 (which libcudf uses).
*
* @note If the input `ArrowDeviceArray` contained a non-null sync_event it is assumed
* to be a `cudaEvent_t*` and the passed in stream will have `cudaStreamWaitEvent` called
* on it with the event. This function, however, will not explicitly synchronize on the
* stream.
*
* @param schema `ArrowSchema` pointer to object describing the type of the device array
* @param input `ArrowDeviceArray` pointer to object owning the Arrow data
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to perform any allocations
* @return `cudf::table_view` generated from given Arrow data
*/
unique_table_view_t from_arrow_device(
const ArrowSchema* schema,
const ArrowDeviceArray* input,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/** @} */ // end of group
} // namespace cudf
Loading
Loading