ENH: Update pandas read_* funcions of `dtype_backend` param with pyarrow #843

hainaweiben · 2025-02-01T05:45:31Z

What do these changes do?

Update dtype_backend parameter for consistency with pandas

Renamed use_arrow_dtype to dtype_backend in read_parquet and updated related logic to align with pandas and Dask.
Adjusted read_csv and read_sql files to ensure compatibility and reflect the new parameter where applicable.
Added relevant documentation updates to reflect these changes.

This change aims at providing a consistent user experience across pandas, Dask, and xorbits by adopting the dtype_backend parameter.

Related issue number

Fixes #770

Check code requirements

tests added / passed (if needed)
Ensure all linting tests pass

- Renamed 'use_arrow_dtype' to 'dtype_backend' in read_parquet and updated related logic to align with pandas and Dask. - Adjusted read_csv and read_sql files to ensure compatibility and reflect the new parameter where applicable. - Added relevant documentation updates to reflect these changes. This change aims at providing a consistent user experience across pandas, Dask, and xorbits by adopting the 'dtype_backend' parameter.

…naweiben/xorbits into feature/dtype-backend-update

Fix segmentation fault in tensor operations by adding version constraints for numpy and pyarrow in Python 3.11 test environment. The issue occurs in numpy.searchsorted when used with newer versions of numpy (>=2.2.0) and pyarrow (>=19.0.0) in tensor-related tests.

…to feature/dtype-backend-update

codecov · 2025-02-01T05:52:40Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.03%. Comparing base (232c1d9) to head (1204771).
Report is 1 commits behind head on main.

❌ Your project status has failed because the head coverage (82.03%) is below the target coverage (90.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #843      +/-   ##
==========================================
- Coverage   82.04%   82.03%   -0.01%     
==========================================
  Files        1071     1071              
  Lines       80153    80155       +2     
  Branches    12202    12203       +1     
==========================================
+ Hits        65758    65759       +1     
+ Misses      12837    12830       -7     
- Partials     1558     1566       +8

Flag	Coverage Δ
unittests	`81.93% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

luweizheng

Should we use numpy or numpy_nullable for dtype_backend.

python/xorbits/_mars/dataframe/datasource/read_csv.py

python/xorbits/_mars/config.py

.github/workflows/python.yaml

…e `dtype_backend` parameter across the DataFrame-related functionalities in Xorbits.

- Change the parameter "use_arrow_dtype" to "dtype_backend" in the benchmark.

…to feature/dtype-backend-update

hainaweiben · 2025-02-03T04:21:26Z

The optional values for the parameter dtype_backend are "numpy_nullable" and "pyarrow", with the former being the default.

luweizheng

LGTM

hainaweiben added 20 commits January 27, 2025 17:07

use action test

d82457e

fix test and read_parquet

a14c9c1

fix

23ea783

fix test

77bc0d5

fix test

15d4c71

test python CI

4020a7a

test python CI

ea50088

fix pytest test_local_classifier_from_to_parquet

509b905

test CI

ffe167d

change last

40f2420

build: upgrade Alluxio from 2.9.3 to 2.9.5

9772cbd

build: upgrade Alluxio from 2.9.3 to 2.9.5

d34a480

Revert changes

f22e4c0

Merge branch 'feature/dtype-backend-update' of https://github.com/hai…

3fbaff4

…naweiben/xorbits into feature/dtype-backend-update

Merge remote-tracking branch 'origin/feature/dtype-backend-update' in…

11d5972

…to feature/dtype-backend-update

Added global version constraints for pyarrow and numpy.

4923957

fix jax

cf4d163

fix

5914308

hainaweiben changed the title ~~ENH~~ ENH: Standardize dtype_backend Parameter Across pandas and xorbits. Feb 1, 2025

XprobeBot added the enhancement New feature or request label Feb 1, 2025

luweizheng requested changes Feb 1, 2025

View reviewed changes

python/xorbits/_mars/dataframe/datasource/read_csv.py Outdated Show resolved Hide resolved

python/xorbits/_mars/config.py Show resolved Hide resolved

luweizheng reviewed Feb 1, 2025

View reviewed changes

.github/workflows/python.yaml Outdated Show resolved Hide resolved

luweizheng reviewed Feb 1, 2025

View reviewed changes

.github/workflows/python.yaml Outdated Show resolved Hide resolved

hainaweiben and others added 4 commits February 3, 2025 11:43

This change updates the default behavior and available options for th…

497367c

…e `dtype_backend` parameter across the DataFrame-related functionalities in Xorbits.

Resolve python.yaml conflicts

66986a9

Merge branch 'main' into feature/dtype-backend-update

f40e6f6

- Modify the dtype_backend in the test from "numpy" to "numpy_nullable".

3c92b24

- Change the parameter "use_arrow_dtype" to "dtype_backend" in the benchmark.

hainaweiben added 2 commits February 3, 2025 12:11

Merge remote-tracking branch 'origin/feature/dtype-backend-update' in…

07d7871

…to feature/dtype-backend-update

Remove the comments.

1204771

luweizheng changed the title ~~ENH: Standardize dtype_backend Parameter Across pandas and xorbits.~~ ENH: Update dtype_backend param of read_* with pyarrow engine Feb 4, 2025

luweizheng changed the title ~~ENH: Update dtype_backend param of read_* with pyarrow engine~~ ENH: Update pandas read_* funcions of dtype_backend param with pyarrow Feb 4, 2025

luweizheng approved these changes Feb 4, 2025

View reviewed changes

luweizheng merged commit d0ef304 into xorbitsai:main Feb 4, 2025
21 of 23 checks passed

hainaweiben deleted the feature/dtype-backend-update branch February 4, 2025 08:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Update pandas read_* funcions of `dtype_backend` param with pyarrow #843

ENH: Update pandas read_* funcions of `dtype_backend` param with pyarrow #843

hainaweiben commented Feb 1, 2025 •

edited by luweizheng

Loading

codecov bot commented Feb 1, 2025 •

edited

Loading

luweizheng left a comment

hainaweiben commented Feb 3, 2025

luweizheng left a comment

ENH: Update pandas read_* funcions of dtype_backend param with pyarrow #843

ENH: Update pandas read_* funcions of dtype_backend param with pyarrow #843

Conversation

hainaweiben commented Feb 1, 2025 • edited by luweizheng Loading

What do these changes do?

Related issue number

Check code requirements

codecov bot commented Feb 1, 2025 • edited Loading

Codecov Report

luweizheng left a comment

Choose a reason for hiding this comment

hainaweiben commented Feb 3, 2025

luweizheng left a comment

Choose a reason for hiding this comment

ENH: Update pandas read_* funcions of `dtype_backend` param with pyarrow #843

ENH: Update pandas read_* funcions of `dtype_backend` param with pyarrow #843

hainaweiben commented Feb 1, 2025 •

edited by luweizheng

Loading

codecov bot commented Feb 1, 2025 •

edited

Loading