53.1.0 (2024-10-02)
Implemented enhancements:
- Write null counts in Parquet statistics when they are known to be zero #6502 [parquet]
- Make it easier to find / work with
ByteView
#6478 [arrow] - Update lexical-core version due to soundness issues with current version #6468
- Add builder style API for manipulating
ParquetMetaData
#6465 [parquet] ArrayData.align_buffers
should supportStruct
data type / child data #6461 [arrow]- Add a method to return the number of skipped rows in a
RowSelection
#6428 [parquet] - Bump lexical-core to 1.0 #6397 [arrow]
- Add union_extract kernel #6386 [arrow]
- implement
regexp_is_match_utf8
andregexp_is_match_utf8_scalar
forStringViewArray
#6370 [arrow] - Add support for BinaryView in arrow_string::length #6358 [arrow]
- Add
as_union
toAsArray
#6351 - Ability to append non contiguous strings to
StringBuilder
#6347 [arrow] - Add Catalog DB Schema subcommands to
flight_sql_client
#6331 [arrow] [arrow-flight] - Add support for Utf8View in arrow_string::length #6305 [arrow]
- Reading FIXED_LEN_BYTE_ARRAY columns with nulls is inefficient #6296 [parquet]
- Optionally verify 32-bit CRC checksum when decoding parquet pages #6289 [parquet]
- Speed up
pad_nulls
forFixedLenByteArrayBuffer
#6297 [parquet] (etseidl) - Improve performance of set_bits by avoiding to set individual bits #6288 [arrow] (kazuyukitanimura)
Fixed bugs:
- BitIterator panics when retrieving length #6480 [arrow]
- Flight data retrieved via Python client (wrapping C++) cannot be used by Rust Arrow #6471 [arrow]
- CI integration test failing: Archery test With other arrows #6448 [parquet] [arrow] [arrow-flight]
- IPC not respecting not preserving dict ID #6443 [parquet] [arrow] [arrow-flight]
- Failing CI: Prost requires Rust 1.71.1 #6436 [arrow] [arrow-flight]
- Invalid struct arrays in IPC data causes panic during read #6416 [arrow]
- REE Dicts cannot be encoded/decoded with streaming IPC #6398 [arrow]
- Reading json
map
with non-nullable value schema doesn't error if values are actually null #6391 - StringViewBuilder with deduplication does not clear observed values #6384 [arrow]
- Cast from Decimal(p, s) to dictionary-encoded Decimal(p, s) loses precision and scale #6381 [arrow]
- LocalFileSystem
list
operation returns objects in wrong order #6375 compute::binary_mut
returnsErr(PrimitiveArray<T>)
only with certain arrays #6374 [arrow]- Exporting Binary/Utf8View from arrow-rs to pyarrow fails #6366 [arrow]
- warning: methods
as_any
andnext_batch
are never used inparquet
crate #6143 [parquet]
Documentation updates:
- chore: add docs, part of #37 #6496 [parquet] [arrow] [arrow-flight] (ByteBaker)
- Minor: improve
ChunkedReader
docs #6477 [parquet] (alamb) - Minor: Add some missing documentation to fix CI errors #6445 [arrow] (etseidl)
- Fix doc "bit width" to "byte width" #6434 [arrow] (kylebarron)
- chore: add docs, part of #37 #6433 [arrow] (ByteBaker)
- chore: add docs, part of #37 #6424 [arrow] (ByteBaker)
- Rephrase doc comment #6421 [parquet] [arrow] [arrow-flight] (waynexia)
- Remove "NOT YET FULLY SUPPORTED" comment from DataType::Utf8View/BinaryView #6380 [arrow] (alamb)
- Improve
GenericStringBuilder
documentation #6372 [arrow] (alamb)
Closed issues:
- Columnar json writer for arrow-json #6411
- Primitive
binary
/unary
are not as fast as they could be #6364 [arrow] - Different numeric type may be able to compare #6357
Merged pull requests:
- fix: override
size_hint
forBitIterator
to return the exact remaining size #6495 [arrow] (Beihao-Zhou) - Minor: Fix path in format command in CONTRIBUTING.md #6494 (etseidl)
- Write null counts in Parquet statistics when they are known #6490 [parquet] (etseidl)
- Add configuration option to
StatisticsConverter
to control interpretation of missing null counts in Parquet statistics #6485 [parquet] (etseidl) - fix: check overflow numbers while inferring type for csv files #6481 [arrow] (CookiePieWw)
- Add better documentation, examples and builer-style API to
ByteView
#6479 [arrow] (alamb) - Add take_arrays util for getting entries from 2d arrays #6475 [arrow] (akurmustafa)
- Deprecate
MetadataLoader
#6474 [parquet] (etseidl) - Update tonic-build requirement from =0.12.2 to =0.12.3 #6473 [arrow] [arrow-flight] (dependabot[bot])
- Align buffers from Python (FFI) #6472 [arrow] (EnricoMi)
- Add
ParquetMetaDataBuilder
#6466 [parquet] (alamb) - Make
ArrayData.align_buffers
align child data buffers recursively #6462 [arrow] (EnricoMi) - Minor: Silence compiler warnings for
parquet::file::metadata::reader
#6457 [parquet] (etseidl) - Minor: Error rather than panic for unsupported for dictionary
cast
ing #6456 [arrow] (goldmedal) - Support cast between Durations + between Durations all numeric types #6452 [arrow] (tisonkun)
- Deprecate methods from footer.rs in favor of
ParquetMetaDataReader
#6451 [parquet] (etseidl) - Workaround for missing Parquet page indexes in
ParquetMetadaReader
#6450 [parquet] (etseidl) - Fix CI by disabling newly failing rust <> nanoarrow integration test in CI #6449 (alamb)
- Add
IpcSchemaEncoder
, deprecate ipc schema functions, Fix IPC not respecting not preserving dict ID #6444 [parquet] [arrow] [arrow-flight] (brancz) - Add additional documentation and builder APIs to
SortOptions
#6441 [arrow] (alamb) - Update prost-build requirement from =0.13.2 to =0.13.3 #6440 [arrow] [arrow-flight] (dependabot[bot])
- Bump arrow-flight MSRV to 1.71.1 #6437 [arrow] [arrow-flight] (gstvg)
- Silence warnings that
as_any
andnext_batch
are never used #6432 [parquet] (etseidl) - Add
ParquetMetaDataReader
#6431 [parquet] (etseidl) - Add RowSelection::skipped_row_count #6429 [parquet] (progval)
- perf: Faster decimal precision overflow checks #6419 [arrow] (andygrove)
- fix: don't panic in IPC reader if struct child arrays have different lengths #6417 [arrow] (alexwilcoxson-rel)
- Reduce integration test matrix #6407 (kou)
- Move lifetime of
take_iter
from iterator to its items #6403 [arrow] (dariocurr) - Update lexical-core requirement from 0.8 to 1.0 (to resolve RUSTSEC-2023-0086) #6402 [arrow] (dariocurr)
- Fix encoding/decoding REE Dicts when using streaming IPC #6399 [arrow] (brancz)
- fix: binary_mut should work if only one input array has null buffer #6396 [arrow] (viirya)
- Add
set_bits
fuzz test #6394 [arrow] (alamb) - impl
From<ScalarBuffer<T>>
forBuffer
#6389 [arrow] (mbrobbel) - Add
union_extract
kernel #6387 [arrow] (gstvg) - Clear string-tracking hash table when ByteView deduplication is enabled #6385 [arrow] (shanesveller)
- fix: Stop losing precision and scale when casting decimal to dictionary #6383 [arrow] (andygrove)
- Add
ARROW_VERSION
const #6379 [arrow] (samuelcolvin) - parquet writer: Raise an error when the row_group_index overflows i16 #6378 [parquet] (progval)
- Implement native support StringViewArray for
regexp_is_match
andregexp_is_match_scalar
function, deprecateregexp_is_match_utf8
andregexp_is_match_utf8_scalar
#6376 [arrow] (tlm365) - Update chrono-tz requirement from 0.9 to 0.10 #6371 [arrow] (dependabot[bot])
- Support StringViewArray interop with python: fix lingering C Data Interface issues for *ViewArray #6368 [arrow] (a10y)
- stop panic in
MetadataLoader
on invalid data #6367 [parquet] (samuelcolvin) - Add support for BinaryView in arrow_string::length #6359 [arrow] (Omega359)
- impl
From<Vec<T>>
forBuffer
#6355 [arrow] (mbrobbel) - Add breaking change from #6043 to
CHANGELOG
#6354 (mbrobbel) - Benchmark for bit_mask (set_bits) #6353 [arrow] (kazuyukitanimura)
- Update prost-build requirement from =0.13.1 to =0.13.2 #6350 [arrow] [arrow-flight] (dependabot[bot])
- fix: clippy warnings from nightly rust 1.82 #6348 [parquet] [arrow] (waynexia)
- Add support for Utf8View in arrow_string::length #6345 [arrow] (Omega359)
- feat: add catalog/schema subcommands to flight_sql_client. #6332 [arrow] [arrow-flight] (nathanielc)
- Manually run fmt on all files under parquet #6328 [parquet] (etseidl)
- Implement UnionArray logical_nulls #6303 [arrow] (gstvg)
- Parquet: Verify 32-bit CRC checksum when decoding pages #6290 [parquet] (xmakro)
* This Changelog was automatically generated by github_changelog_generator