Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyArrow 19.0.0 raises exception with to_pyarrow_table() #3147

Closed
rustyconover opened this issue Jan 20, 2025 · 2 comments
Closed

PyArrow 19.0.0 raises exception with to_pyarrow_table() #3147

rustyconover opened this issue Jan 20, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@rustyconover
Copy link

Environment

Delta-rs version: 0.24.0


Bug

What happened:

When calling to_pyarrow_table() with PyArrow version 19.0.0, an exception is raised with the following traceback:

Traceback (most recent call last):
  File "/Users/rusty/deltalake-experiments/.venv/bin/deltalake_experiment", line 8, in <module>
    sys.exit(run())
             ^^^^^
  File "/Users/rusty/deltalake-experiments/src/deltalake_experiments/__init__.py", line 38, in run
    t = dt.to_pyarrow_table()
        ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rusty/deltalake-experiments/.venv/lib/python3.12/site-packages/deltalake/table.py", line 1236, in to_pyarrow_table
    ).to_table(columns=columns, filter=filters)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/_dataset.pyx", line 574, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 3865, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
OSError: Repetition level histogram size mismatch

What you expected to happen:

The to_pyarrow_table() method should execute without raising any exceptions, returning the corresponding PyArrow table as expected.

How to reproduce it:

Here is a short test case

def run() -> int:
    example_schema = pa.schema(
        [
            pa.field("name", pa.string()),
        ]
    )
    example_table = pa.Table.from_pylist(
        [{"name": "Tintin"}],
        schema=example_schema,
    )

    deltalake.write_deltalake(
        "./test-table",
        example_table,
        mode="overwrite",
    )

    dt = deltalake.DeltaTable("./test-table")
    t = dt.to_pyarrow_table()
    print(t)

Additional Information

The issue appears to be specific to PyArrow version 19.0.0.

Thank you for your assistance in investigating this issue! Let me know if I can provide additional details or perform further tests.

@rustyconover rustyconover added the bug Something isn't working label Jan 20, 2025
@rustyconover rustyconover changed the title Problem with PyArrow 19.0.0 PyArrow 19.0.0 raises exception with to_pyarrow_table() Jan 20, 2025
@ion-elgreco
Copy link
Collaborator

@rustyconover this is a known issue of pyarrow: apache/arrow#45283

@ion-elgreco ion-elgreco closed this as not planned Won't fix, can't repro, duplicate, stale Jan 20, 2025
@rustyconover
Copy link
Author

Cool thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants