Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Safe conversion between arrow-rs and arrow2 Arrays #1446

Merged
merged 16 commits into from
Apr 12, 2023

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Mar 29, 2023

Relates to #1429

This provides safe conversion between arrow2 arrays and arrow-rs ArrayData

@codecov
Copy link

codecov bot commented Mar 29, 2023

Codecov Report

Patch coverage: 98.73% and project coverage change: +0.45 🎉

Comparison is base (73ed7c8) 83.45% compared to head (41bd956) 83.90%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1446      +/-   ##
==========================================
+ Coverage   83.45%   83.90%   +0.45%     
==========================================
  Files         376      387      +11     
  Lines       41161    41554     +393     
==========================================
+ Hits        34350    34867     +517     
+ Misses       6811     6687     -124     
Impacted Files Coverage Δ
src/array/binary/mod.rs 91.82% <ø> (ø)
src/array/boolean/mod.rs 79.41% <ø> (ø)
src/array/dictionary/mod.rs 95.32% <ø> (ø)
src/array/fixed_size_binary/mod.rs 84.24% <ø> (ø)
src/array/fixed_size_list/mod.rs 85.84% <ø> (ø)
src/array/list/mod.rs 91.74% <ø> (ø)
src/array/map/mod.rs 72.81% <ø> (+10.67%) ⬆️
src/array/primitive/mod.rs 80.55% <ø> (ø)
src/array/struct_/mod.rs 67.44% <ø> (ø)
src/array/union/mod.rs 89.71% <ø> (+2.33%) ⬆️
... and 15 more

... and 7 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Cargo.toml Outdated Show resolved Hide resolved
src/array/boolean/data.rs Outdated Show resolved Hide resolved
src/array/binary/data.rs Outdated Show resolved Hide resolved
@tustvold tustvold changed the title WIP: ArrayData conversion ArrayData conversion Apr 10, 2023
@alamb alamb changed the title ArrayData conversion Safe conversion between Arrays and arrow-rs ArrayData Apr 10, 2023
Copy link
Collaborator

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through this PR carefully. Thank you very much @tustvold

I will admit to not being familiar with the internals of arrow2, but given the fact that this PR has significant test coverage I think it looks great to me. 👍

If it is the case that BooleanArray is not covered in tests/it/arrow.rs I suggest we add some coverage to complete the 🌈 .

I defer to @jorgecarleitao / @ritchie46 / @sundy-li about the right next steps for this PR (e.g. merge / release / whatever).

src/array/mod.rs Outdated Show resolved Hide resolved
src/array/mod.rs Outdated Show resolved Hide resolved
fn from_data(data: &ArrayData) -> Self {
let data_type = data.data_type().clone().into();

let mut values: Buffer<T> = data.buffers()[0].clone().into();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is zero copy, right? The slice call below is because arrow2 Array has the offset/len in the Buffer<T> itself rather than as separate fields ?

@@ -0,0 +1,350 @@
use arrow2::array::*;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😮 you even wrote the round trip tests. Impressive

The only thing I didn't see in here was a test for BooleanArray

let to_arrow = ArrayRef::from(array);
test_arrow2_roundtrip(to_arrow.as_ref());

if !array.is_empty() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

#[test]
fn test_primitive() {
let data_type = DataType::Int32;
let array = PrimitiveArray::new(data_type, vec![1, 2, 3].into(), None);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way I can think of to make these tests better is to manually construct the arrow-rs array as well (otherwise there is just one original source of truth). However I think these tests are very impressive overall and don't actually need anything more

.into_iter()
.map(Some),
);
let values = PrimitiveArray::<i32>::from_iter([Some(1), None, Some(3), Some(1), None]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it intended that key1 = Some(1) is repeated? Would it be adding more coverage if the second value was something different (like key1 = Some(10))?

@alamb
Copy link
Collaborator

alamb commented Apr 10, 2023

https://github.com/jorgecarleitao/arrow2/actions/runs/4659165653/jobs/8245723577?pr=1446 is failing on master and I think unrelated to this PR

@tustvold tustvold changed the title Safe conversion between Arrays and arrow-rs ArrayData Safe conversion between arrow-rs and arrow2 Arrays Apr 11, 2023
@jorgecarleitao
Copy link
Owner

jorgecarleitao commented Apr 12, 2023

Thank you @tustvold for this beautiful PR 🙇

@jorgecarleitao jorgecarleitao merged commit ded2ab1 into jorgecarleitao:main Apr 12, 2023
@sundy-li
Copy link
Collaborator

@tustvold is arrow-rs37.0.0 ready to bump?

@tustvold
Copy link
Contributor Author

@tustvold is arrow-rs37.0.0 ready to bump?

Yes, I can prepare a PR for this. It has some non-trivial breaking changes to schema representation

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants