Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PrimitiveDictionaryBuilder with specific value data type and capacity #7011

Closed
rluvaton opened this issue Jan 23, 2025 · 2 comments · Fixed by #7012
Closed

PrimitiveDictionaryBuilder with specific value data type and capacity #7011

rluvaton opened this issue Jan 23, 2025 · 2 comments · Fixed by #7012
Labels
arrow Changes to the arrow crate bug

Comments

@rluvaton
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I want to create a PrimitiveDictionaryBuilder with custom capacity and specific data type for the values but I can't

#[test]
fn should_work() {
    let data_type = DataType::Dictionary(
        Box::new(DataType::Int32),
        Box::new(DataType::Timestamp(
            TimeUnit::Microsecond,
            Some("+08:00".into()),
        )),
    );

    let mut builder =
      PrimitiveDictionaryBuilder::<Int32Type, TimestampMicrosecondType>::with_capacity(1, 2);

    assert_eq!(builder.finish().data_type(), &data_type);
} 

This test fails with:

assertion `left == right` failed
  left: Dictionary(Int32, Timestamp(Microsecond, None))
 right: Dictionary(Int32, Timestamp(Microsecond, Some("+08:00")))

so I tried using new_from_builders

#[test]
fn should_work() {
    let value_data_type = DataType::Timestamp(
        TimeUnit::Microsecond,
        Some("+08:00".into()),
    );
    let data_type = DataType::Dictionary(
        Box::new(DataType::Int32),
        Box::new(value_data_type.clone()),
    );

    let mut builder =
      unsafe {
          PrimitiveDictionaryBuilder::<Int32Type, TimestampMicrosecondType>::new_from_builders(
              PrimitiveBuilder::with_capacity(1).with_data_type(DataType::Int32),
              PrimitiveBuilder::with_capacity(2).with_data_type(value_data_type),
          )
      };

    assert_eq!(builder.finish().data_type(), &data_type);
}

this works but I can't control the hash map capacity, and the keys and values are unnecessary iterated over (even if they are empty)

Describe the solution you'd like
Provide with_value_data_type function that will be similar to primitive_builder

so you could do:

let value_data_type = DataType::Timestamp(
    TimeUnit::Microsecond,
    Some("+08:00".into()),
);
let data_type = DataType::Dictionary(
    Box::new(DataType::Int32),
    Box::new(value_data_type.clone()),
);

let mut builder =
  PrimitiveDictionaryBuilder::<Int32Type, TimestampMicrosecondType>::with_capacity(1, 2)
    .with_value_data_type(value_data_type);
@rluvaton rluvaton added the enhancement Any new improvement worthy of a entry in the changelog label Jan 23, 2025
rluvaton added a commit to rluvaton/arrow-rs that referenced this issue Jan 23, 2025
@tustvold
Copy link
Contributor

tustvold commented Jan 23, 2025

Can you use https://docs.rs/arrow-array/latest/arrow_array/builder/struct.PrimitiveDictionaryBuilder.html#method.new_from_empty_builders ?

Edit: oh I see you are but it isn't picking up the data type, this IMO is a bug and should be an easy enough fix

@tustvold tustvold added bug and removed enhancement Any new improvement worthy of a entry in the changelog labels Jan 23, 2025
tustvold pushed a commit that referenced this issue Jan 25, 2025
…ctionaryBuilder::new_from_builders` (#7012)

* feat: allow setting custom value data type in `PrimitiveDictionaryBuilder`

Fixes #7011

* use the values capacity for the hash map

* update new_from_empty_builders and not new_from_builders
@alamb alamb added the arrow Changes to the arrow crate label Jan 27, 2025
@alamb
Copy link
Contributor

alamb commented Jan 27, 2025

label_issue.py automatically added labels {'arrow'} from #7012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug
Projects
None yet
3 participants