Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the precision when converting a decimal128 column to an arrow array #14230

Merged
merged 5 commits into from
Oct 28, 2023

Conversation

jihoonson
Copy link
Contributor

Description

This PR fixes #13749. As discussed in the issue linked, the precision is unnecessarily being limited to 18 when converting decimal128 to arrow.

Implementation-wise, I wasn't sure where is the best place to define the max precision for decimal types. Given that the decimal types don't store the precision in libcudf, I thought it would be better to not expose the max precision to the outside of to-arrow conversion. However, this led to replicating the definition of max precision across multiple places. Appreciate any suggestion.

Finally, it was suggested in #13749 (comment) to add some tests for round tripping. The existing tests look sufficient to me for round trip tests, so I just modified them instead of adding new tests. Please let me know if we need new tests in addition to them.

I'm also not sure whether any documentation should be fixed. Please let me know.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 28, 2023

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Sep 28, 2023
@jihoonson
Copy link
Contributor Author

jihoonson commented Sep 28, 2023

Uh, should I create this PR against an other branch than the main?

@davidwendt
Copy link
Contributor

Uh, should I create this PR against an other branch than the main?

The PR should go against branch-23.12

@jihoonson jihoonson changed the base branch from main to branch-23.12 September 28, 2023 18:50
@jihoonson
Copy link
Contributor Author

Thanks @davidwendt. Working on the rebase and conflict.

@vyasr
Copy link
Contributor

vyasr commented Oct 9, 2023

/ok to test

@vyasr vyasr added bug Something isn't working non-breaking Non-breaking change labels Oct 9, 2023
@vyasr
Copy link
Contributor

vyasr commented Oct 9, 2023

/ok to test

@vyasr
Copy link
Contributor

vyasr commented Oct 9, 2023

@jihoonson thanks for the PR! Style checks are currently failing. Could you please run our pre-commit hooks locally on your system to fix those? If you don't have those set up, the steps are (I'm making no assumptions about your environment, you may want to change how you install pre-commit if you use conda for instance):

cd /path/to/repo
pip install pre-commit
python -m pre-commit install
pre-commit run --all-files

Then commit and push those changes.

Implementation-wise, I wasn't sure where is the best place to define the max precision for decimal types. Given that the decimal types don't store the precision in libcudf, I thought it would be better to not expose the max precision to the outside of to-arrow conversion. However, this led to replicating the definition of max precision across multiple places. Appreciate any suggestion.

That's fine. Since as you say libcudf doesn't have a concept of the precision the only real place it makes sense to include is in the conversion function. I wouldn't sweat the mild duplication.

Finally, it was suggested in #13749 (comment) to add some tests for round tripping. The existing tests look sufficient to me for round trip tests, so I just modified them instead of adding new tests. Please let me know if we need new tests in addition to them.

I agree with your approach, modifying is fine.

I'm also not sure whether any documentation should be fixed. Please let me know.

At the moment this behavior is not specifically documented. What might be nice is to add a note to the to_arrow C++ documentation indicating that decimal types will be converted to the widest precision supported by that type. Perhaps an @note section would be good.

@jihoonson
Copy link
Contributor Author

Thanks @vyasr and @ttnghia for the review! I addressed your comments in the last 3 commits.

@ttnghia
Copy link
Contributor

ttnghia commented Oct 19, 2023

/ok to test

@jihoonson
Copy link
Contributor Author

Thanks @hyperbolic2346 and @ttnghia! What will be the next step? Please let me know if there is anything I need to do 🙂

@ttnghia
Copy link
Contributor

ttnghia commented Oct 27, 2023

/ok to test

@ttnghia
Copy link
Contributor

ttnghia commented Oct 27, 2023

/merge

@jihoonson
Copy link
Contributor Author

Hmm the codecov failed for python/cudf/cudf/io/orc.py, which seems unrelated to my change in this PR.

@ttnghia
Copy link
Contributor

ttnghia commented Oct 28, 2023

/merge

@rapids-bot rapids-bot bot merged commit 2a923df into rapidsai:branch-23.12 Oct 28, 2023
57 of 58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Decimal128 uses a fixed precision of 18 when converting to an arrow array
5 participants