-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add from_arrow_host
functions for cudf interop with nanoarrow
#15645
Add from_arrow_host
functions for cudf interop with nanoarrow
#15645
Conversation
@vyasr @davidwendt I've got this up! There's still another test or two I need to add, but I figured it would be good to get this PR filed before the weekend and get some eyes on it. |
/ok to test |
/ok to test |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps rename the from_arrow_device_host_test.cpp
to from_arrow_host_test.cpp
?
#include <cudf/types.hpp> | ||
|
||
#include <thrust/iterator/counting_iterator.h> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole file could use a lot more comments explaining what exactly is happening and getting tested; there are very few comments here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some more comments to this. if there are specific areas where it's unclear what is going on, let me know and i'll add some more comments to those areas. I had figured the test names and code was sufficient, but that might be because i'm already familiar with a lot of the nano arrow code patterns
cpp/src/interop/from_arrow_host.cu
Outdated
ArrowSchemaView view; | ||
NANOARROW_THROW_NOT_OK(ArrowSchemaViewInit(&view, schema, nullptr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should move this and the CUDF_EXPECTS
checks to the detail function.
Same for the cudf::from_arrow_host_column
API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. Any particular reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is our normal pattern. https://github.com/rapidsai/cudf/blob/branch-24.06/cpp/doxygen/developer_guide/DEVELOPER_GUIDE.md#libcudf-api-and-implementation
The public API does the nvtx range and calls a detail function with a similar signature with no default parameters.
The detail
functions may also be called by other internal functions and we would want those paths to also check parameters, etc.
Co-authored-by: David Wendt <[email protected]>
/ok to test |
/ok to test |
Style again :( |
Looks like you will need to update these
to expect the std::invalid_argument exception instead of cudf::logic_error
|
/ok to test |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zeroshade One thing I meant to mention earlier but I completely forgot to ask about: would it make sense/be simpler to implement these methods by copying the arrow array to an ArrowDeviceArray with device_type ARROW_DEVICE_GPU
and then calling the from_arrow_device
functions? No need to walk it back at this point since we've come this far, but I wanted to at least have the conversation.
Co-authored-by: Vyas Ramasubramani <[email protected]>
@vyasr Just to make sure I understand the suggestion, you're saying recursively spin through the That's definitely an idea we could go with that would highly simplify the code. Though I'd have to look through because I think there might be scenarios where that would end up performing an extra copy or so of the data? I don't remember offhand if we pre-emptively copy to the device before every transformation already or not. You might be right though, it might be a great way to simplify this. I'll definitely look into it after we get this merged (since you said we don't need to walk this back at this point) |
/ok to test |
Yup, that's exactly what I'm suggesting. It would make maintaining easier, and it would also make testing easier since we'd effectively only have one real conversion test path to test (and it facilitates testing from higher level bindings in Java and Python where we can easily generate Arrow host data but not device data right now). But yeah we don't need to wait to merge this PR. |
I don't think the test failures are my fault here. Is there anything else needed on my end to get this merged? |
No, nothing on your end. Unfortunately we're just having some CI outages. You're all good here. |
/ok to test |
/merge |
Description
Following up from #15458 and continuing the work to address #14926 adding host memory version of
from_arrow_device
which will perform the copies from host memory to create cudf objects.Checklist