Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support inference of unsigned integer types #45312

Open
ben-freist opened this issue Jan 20, 2025 · 0 comments
Open

Support inference of unsigned integer types #45312

ben-freist opened this issue Jan 20, 2025 · 0 comments

Comments

@ben-freist
Copy link
Contributor

Describe the enhancement requested

The type inference for schema detection that's implemented here

int_count_ += numpy_dtype_count_;
does not distinguish between signed and unsigned integer types.

This leads to the following behaviour, I think it would be nice if that was more consistent.

import pyarrow as pa
import pandas as pd

data_1 = [{"a": pow(2, 63) - 1}]
schema_1 = pa.Schema.from_pandas(pd.DataFrame(data_1))
print(schema_1) # takes a different codepath, correctly infers uint64
data_2 = [{"a": [pow(2, 63) - 1]}]
schema_2 = pa.Schema.from_pandas(pd.DataFrame(data_2)) # crashes

Here's the backtrace that you get when trying to compute schema_2.

Traceback (most recent call last):
  File "/work/arrow/foo.py", line 5, in <module>
    schema = pa.Schema.from_pandas(pd.DataFrame(data))
  File "pyarrow/types.pxi", line 3104, in pyarrow.lib.Schema.from_pandas
  File "/work/arrow/pyarrow-dev/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 562, in dataframe_to_types
    type_ = pa.array(c, from_pandas=True).type
  File "pyarrow/array.pxi", line 360, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 87, in pyarrow.lib._ndarray_to_array
  File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
OverflowError: Python int too large to convert to C long

Is that something that can be changed or would that likely have too many unintended consequences?

I've tested this with pyarrow version 19.0.0 on ubuntu 24.04.

Component(s)

C++

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant