-
Notifications
You must be signed in to change notification settings - Fork 223
Safe conversion between arrow-rs and arrow2 Array
s
#1446
Safe conversion between arrow-rs and arrow2 Array
s
#1446
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #1446 +/- ##
==========================================
+ Coverage 83.45% 83.90% +0.45%
==========================================
Files 376 387 +11
Lines 41161 41554 +393
==========================================
+ Hits 34350 34867 +517
+ Misses 6811 6687 -124
... and 7 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
Array
s and arrow-rs ArrayData
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through this PR carefully. Thank you very much @tustvold
I will admit to not being familiar with the internals of arrow2, but given the fact that this PR has significant test coverage I think it looks great to me. 👍
If it is the case that BooleanArray
is not covered in tests/it/arrow.rs
I suggest we add some coverage to complete the 🌈 .
I defer to @jorgecarleitao / @ritchie46 / @sundy-li about the right next steps for this PR (e.g. merge / release / whatever).
fn from_data(data: &ArrayData) -> Self { | ||
let data_type = data.data_type().clone().into(); | ||
|
||
let mut values: Buffer<T> = data.buffers()[0].clone().into(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is zero copy, right? The slice
call below is because arrow2
Array has the offset/len in the Buffer<T>
itself rather than as separate fields ?
@@ -0,0 +1,350 @@ | |||
use arrow2::array::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😮 you even wrote the round trip tests. Impressive
The only thing I didn't see in here was a test for BooleanArray
let to_arrow = ArrayRef::from(array); | ||
test_arrow2_roundtrip(to_arrow.as_ref()); | ||
|
||
if !array.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
#[test] | ||
fn test_primitive() { | ||
let data_type = DataType::Int32; | ||
let array = PrimitiveArray::new(data_type, vec![1, 2, 3].into(), None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way I can think of to make these tests better is to manually construct the arrow-rs array as well (otherwise there is just one original source of truth). However I think these tests are very impressive overall and don't actually need anything more
.into_iter() | ||
.map(Some), | ||
); | ||
let values = PrimitiveArray::<i32>::from_iter([Some(1), None, Some(3), Some(1), None]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it intended that key1 = Some(1)
is repeated? Would it be adding more coverage if the second value was something different (like key1 = Some(10)
)?
https://github.com/jorgecarleitao/arrow2/actions/runs/4659165653/jobs/8245723577?pr=1446 is failing on master and I think unrelated to this PR |
Array
s and arrow-rs ArrayData
Array
s
Co-authored-by: Andrew Lamb <[email protected]>
Thank you @tustvold for this beautiful PR 🙇 |
@tustvold is arrow-rs |
Yes, I can prepare a PR for this. It has some non-trivial breaking changes to schema representation |
Relates to #1429
This provides safe conversion between arrow2 arrays and arrow-rs ArrayData