Fix the precision when converting a decimal128 column to an arrow array #14230

jihoonson · 2023-09-28T17:59:01Z

Description

This PR fixes #13749. As discussed in the issue linked, the precision is unnecessarily being limited to 18 when converting decimal128 to arrow.

Implementation-wise, I wasn't sure where is the best place to define the max precision for decimal types. Given that the decimal types don't store the precision in libcudf, I thought it would be better to not expose the max precision to the outside of to-arrow conversion. However, this led to replicating the definition of max precision across multiple places. Appreciate any suggestion.

Finally, it was suggested in #13749 (comment) to add some tests for round tripping. The existing tests look sufficient to me for round trip tests, so I just modified them instead of adding new tests. Please let me know if we need new tests in addition to them.

I'm also not sure whether any documentation should be fixed. Please let me know.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2023-09-28T17:59:04Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

jihoonson · 2023-09-28T18:00:44Z

Uh, should I create this PR against an other branch than the main?

davidwendt · 2023-09-28T18:31:28Z

Uh, should I create this PR against an other branch than the main?

The PR should go against branch-23.12

jihoonson · 2023-09-28T18:52:16Z

Thanks @davidwendt. Working on the rebase and conflict.

vyasr · 2023-10-09T18:13:42Z

/ok to test

vyasr · 2023-10-09T18:14:18Z

/ok to test

vyasr · 2023-10-09T18:43:03Z

@jihoonson thanks for the PR! Style checks are currently failing. Could you please run our pre-commit hooks locally on your system to fix those? If you don't have those set up, the steps are (I'm making no assumptions about your environment, you may want to change how you install pre-commit if you use conda for instance):

cd /path/to/repo
pip install pre-commit
python -m pre-commit install
pre-commit run --all-files

Then commit and push those changes.

Implementation-wise, I wasn't sure where is the best place to define the max precision for decimal types. Given that the decimal types don't store the precision in libcudf, I thought it would be better to not expose the max precision to the outside of to-arrow conversion. However, this led to replicating the definition of max precision across multiple places. Appreciate any suggestion.

That's fine. Since as you say libcudf doesn't have a concept of the precision the only real place it makes sense to include is in the conversion function. I wouldn't sweat the mild duplication.

Finally, it was suggested in #13749 (comment) to add some tests for round tripping. The existing tests look sufficient to me for round trip tests, so I just modified them instead of adding new tests. Please let me know if we need new tests in addition to them.

I agree with your approach, modifying is fine.

I'm also not sure whether any documentation should be fixed. Please let me know.

At the moment this behavior is not specifically documented. What might be nice is to add a note to the to_arrow C++ documentation indicating that decimal types will be converted to the widest precision supported by that type. Perhaps an @note section would be good.

cpp/src/interop/to_arrow.cu

cpp/tests/interop/to_arrow_test.cpp

jihoonson · 2023-10-19T22:12:40Z

Thanks @vyasr and @ttnghia for the review! I addressed your comments in the last 3 commits.

ttnghia · 2023-10-19T22:37:32Z

/ok to test

jihoonson · 2023-10-27T04:10:19Z

Thanks @hyperbolic2346 and @ttnghia! What will be the next step? Please let me know if there is anything I need to do 🙂

ttnghia · 2023-10-27T04:12:37Z

/ok to test

ttnghia · 2023-10-27T04:12:46Z

/merge

jihoonson · 2023-10-28T01:10:55Z

Hmm the codecov failed for python/cudf/cudf/io/orc.py, which seems unrelated to my change in this PR.

ttnghia · 2023-10-28T02:10:58Z

/merge

jihoonson requested a review from a team as a code owner September 28, 2023 17:59

jihoonson requested review from hyperbolic2346 and davidwendt September 28, 2023 17:59

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Sep 28, 2023

jihoonson mentioned this pull request Sep 28, 2023

[BUG] Decimal128 uses a fixed precision of 18 when converting to an arrow array #13749

Closed

jihoonson changed the base branch from main to branch-23.12 September 28, 2023 18:50

jihoonson force-pushed the fix-precision-decimal128 branch from e521f25 to c1d023e Compare September 28, 2023 20:04

vyasr assigned jihoonson Oct 9, 2023

vyasr added bug Something isn't working non-breaking Non-breaking change labels Oct 9, 2023

ttnghia reviewed Oct 10, 2023

View reviewed changes

cpp/src/interop/to_arrow.cu Outdated Show resolved Hide resolved

ttnghia reviewed Oct 10, 2023

View reviewed changes

cpp/tests/interop/to_arrow_test.cpp Outdated Show resolved Hide resolved

jihoonson added 4 commits October 19, 2023 15:10

Fix the precision when converting decimal128 to arrow

ba344c2

max_precision() util function

fae799f

docs

fc8fabe

fix style

ade255f

jihoonson force-pushed the fix-precision-decimal128 branch from 68d8ff4 to ade255f Compare October 19, 2023 22:10

hyperbolic2346 approved these changes Oct 23, 2023

View reviewed changes

ttnghia approved these changes Oct 24, 2023

View reviewed changes

Merge branch 'branch-23.12' into fix-precision-decimal128

21d9f32

rapids-bot bot merged commit 2a923df into rapidsai:branch-23.12 Oct 28, 2023
57 of 58 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the precision when converting a decimal128 column to an arrow array #14230

Fix the precision when converting a decimal128 column to an arrow array #14230

jihoonson commented Sep 28, 2023

copy-pr-bot bot commented Sep 28, 2023

jihoonson commented Sep 28, 2023 •

edited

Loading

davidwendt commented Sep 28, 2023

jihoonson commented Sep 28, 2023

vyasr commented Oct 9, 2023

vyasr commented Oct 9, 2023

vyasr commented Oct 9, 2023

jihoonson commented Oct 19, 2023

ttnghia commented Oct 19, 2023

jihoonson commented Oct 27, 2023

ttnghia commented Oct 27, 2023

ttnghia commented Oct 27, 2023

jihoonson commented Oct 28, 2023

ttnghia commented Oct 28, 2023

Fix the precision when converting a decimal128 column to an arrow array #14230

Fix the precision when converting a decimal128 column to an arrow array #14230

Conversation

jihoonson commented Sep 28, 2023

Description

Checklist

copy-pr-bot bot commented Sep 28, 2023

jihoonson commented Sep 28, 2023 • edited Loading

davidwendt commented Sep 28, 2023

jihoonson commented Sep 28, 2023

vyasr commented Oct 9, 2023

vyasr commented Oct 9, 2023

vyasr commented Oct 9, 2023

jihoonson commented Oct 19, 2023

ttnghia commented Oct 19, 2023

jihoonson commented Oct 27, 2023

ttnghia commented Oct 27, 2023

ttnghia commented Oct 27, 2023

jihoonson commented Oct 28, 2023

ttnghia commented Oct 28, 2023

jihoonson commented Sep 28, 2023 •

edited

Loading