Fix: Support for e notation using existing parse_decimal in string to decimal conversion #6905

himadripal · 2024-12-19T01:15:48Z

Which issue does this PR close?

Closes #. apache/datafusion#10315

Rationale for this change

What changes are included in this PR?

Completed :

replaced string to decimal conversion with existing parse_decimal method
added rounding logic to existing parse_decimal method
fixed the cast error message to precision and scale in the decimal cast error instead of default precision and scale.
add rounding logic in parse_e_notation method.Existing function does not round based on scale for eNotations.
removed unused method port over parse_string_to_decimal_native and remove this fn

Are there any user-facing changes?

…decimal() function

error message changed.

himadripal · 2024-12-19T16:31:39Z

@andygrove @viirya @tustvold please take a first look. The one failing test will be fixed once I add the rounding logic in parse-e-notation function.

himadripal · 2024-12-19T16:33:59Z

arrow-cast/src/cast/decimal.rs

-                        .and_then(|v| T::validate_decimal_precision(v, precision).map(|_| v))
+                    parse_decimal::<T>(v, precision, scale).map_err(|_| {
+                        ArrowError::CastError(format!(
+                            "Cannot cast string '{}' to decimal type of precision {} and scale {}",


T:DATA_TYPE shows default Decimal(38,10) or Decimal256(76,..) in the error message, hiding the precision and scale provided for cast.

himadripal · 2024-12-19T16:34:57Z

arrow-cast/src/cast/decimal.rs

@@ -230,6 +231,7 @@ where
    )?))
 }

+#[allow(dead_code)]


This fails in clippy, hence added #[allow(dead_code)], there is no use, if required we can remove it and cover existing tests with parse_decimal.

We should remove this and port the tests, to ensure we aren't losing test coverage / accidentally changing behaviour

tustvold

I think this might be a breaking API change, as it changes the rounding behaviour of parse_decimal?

himadripal · 2024-12-19T16:39:02Z

I think this might be a breaking API change, as it changes the rounding behaviour of parse_decimal?

Clippy did not complain and tests are passing, except one which I'm working on - rounding for e-notation. Would any others build task catch it?

tustvold

This PR seems to remove a number of tests, and orphan some others. If we're changing what cast does, can we please remove the old implementation and port the old tests, so that we aren't losing test coverage.

Also as written this PR is a breaking change, as it alters the rounding behaviour of the parser.

tustvold · 2024-12-26T16:32:04Z

arrow-csv/src/reader/mod.rs

@@ -1284,7 +1284,7 @@ mod tests {
        assert_eq!("53.002666", lat.value_as_string(1));
        assert_eq!("52.412811", lat.value_as_string(2));
        assert_eq!("51.481583", lat.value_as_string(3));
-        assert_eq!("12.123456", lat.value_as_string(4));
+        assert_eq!("12.123457", lat.value_as_string(4));


Here we can see this is a breaking change to the rounding behaviour

Also to note, previous behavior was not correct.

12.12345678 cast to `Decimal128(38, 6)` = 12.123457

It truncated rather than rounding, they're both valid behaviours, changing this is a breaking change

There is an argument for accepting the breaking change to use rounding since it would be consistent with how we cast floating point to decimal. However, do we want to consider adding a parameter to choose between truncation and rounding?

I personally wouldn't characterize this a breaking change, though I can see how others might.

In my opinion, adding a parameter to choose between the behaviors would be the safest thing (aka a field to CastOptions that defaults to the old, rounding, behavior) for https://docs.rs/arrow/latest/arrow/compute/kernels/cast/fn.cast_with_options.html

Maybe @liukun4515 who added much of the initial decimal support in arrow-rs has time to offer historical perspective on rounding vs truncation during casting?

himadripal · 2024-12-26T20:47:14Z

This PR seems to remove a number of tests, and orphan some others. If we're changing what cast does, can we please remove the old implementation and port the old tests, so that we aren't losing test coverage.

Also as written this PR is a breaking change, as it alters the rounding behaviour of the parser.

Thanks @tustvold for the quick review. I've moved over most of the tests for parse_string_to_decimal_native to use parse_decimal whichever is not already covered by another test. let me know if I've missed anything.

…ll tests

himadripal added 2 commits December 18, 2024 17:14

add support for e notation in string to decimal using existing parse_…

8ce814d

…decimal() function

remove println

4b19083

github-actions bot added the arrow Changes to the arrow crate label Dec 19, 2024

himadripal changed the title ~~Fix string to decimal e notation~~ Fix: Support for e notation using existing parse_decimal in string to decimal conversion Dec 19, 2024

added rounding logic for non e-notation,

45ec17e

error message changed.

himadripal commented Dec 19, 2024

View reviewed changes

tustvold reviewed Dec 19, 2024

View reviewed changes

himadripal added 4 commits December 20, 2024 09:35

add rounding logic for parse_e_notation

c69b938

fix csv test case and fix off by one error.

819f0d6

improved the rounding_digit logic

dd3874d

fix parse_decimal for scale=0 case, port over prev tests

460d323

tustvold added api-change Changes to the arrow API next-major-release the PR has API changes and it waiting on the next major version labels Dec 26, 2024

tustvold reviewed Dec 26, 2024

View reviewed changes

removed unused method parse_string_to_decimal_native and moved over a…

a4f0667

…ll tests

himadripal mentioned this pull request Dec 27, 2024

Implement Spark-compatible CAST from String to Decimal apache/datafusion-comet#325

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Support for e notation using existing parse_decimal in string to decimal conversion #6905

Fix: Support for e notation using existing parse_decimal in string to decimal conversion #6905

himadripal commented Dec 19, 2024 •

edited

Loading

himadripal commented Dec 19, 2024 •

edited

Loading

himadripal Dec 19, 2024

himadripal Dec 19, 2024 •

edited

Loading

tustvold Dec 19, 2024

himadripal Jan 2, 2025

tustvold left a comment

himadripal commented Dec 19, 2024

tustvold left a comment

tustvold Dec 26, 2024

himadripal Dec 26, 2024

tustvold Dec 26, 2024

andygrove Jan 2, 2025

alamb Jan 4, 2025

himadripal commented Dec 26, 2024 •

edited

Loading

Fix: Support for e notation using existing parse_decimal in string to decimal conversion #6905

Are you sure you want to change the base?

Fix: Support for e notation using existing parse_decimal in string to decimal conversion #6905

Conversation

himadripal commented Dec 19, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

himadripal commented Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

himadripal Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold left a comment

Choose a reason for hiding this comment

himadripal commented Dec 19, 2024

tustvold left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

himadripal commented Dec 26, 2024 • edited Loading

himadripal commented Dec 19, 2024 •

edited

Loading

himadripal commented Dec 19, 2024 •

edited

Loading

himadripal Dec 19, 2024 •

edited

Loading

himadripal commented Dec 26, 2024 •

edited

Loading