Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise decimal casting for infallible conversions #7021

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

aweltsch
Copy link

@aweltsch aweltsch commented Jan 25, 2025

Which issue does this PR close?

Rationale for this change

As mentioned in the issue there are certain conversions for which we don't need to check precision / overflow and can thus improve performance by using the infallible unary kernel.

Explanation of the optimisation:

Case 1, increasing scale:

Increasing the scale will result in multiplication with powers of 10, thus being equivalent to a "shift left" of the digits.
Every number will thus "gain" additional digits. The operation is safe when the output type gives enough precision (i.e. digits) to contain the original digits, as well as the digits gained.
This can be boiled down to the condition input_prec + (output_scale - input_scale) <= output_prec

Example:

consider a conversion from Decimal(5, 0) to Decimal(8, 3) with the number 12345 * 10^0 = 12345000 * 10^(-3)
If we are starting with any number xxxxx, then an increase of scale by 3 will have the following effect on the representation: [xxxxx] -> [xxxxx000].
So for this to work, every output type needs to have at least 8 digits of precision.

Case 2, decreasing scale:

Decreasing the scale will result in division with powers of 10, thus "shifting right" of the digits and adding a rounding term. We usually need less precision to represent the result. By shifting right we lose input_scale - output_scale digits, but adding a rounding term can result in one additional digit being required, therefore we can boil this down to
input_prec - (input_scale - output_scale) < output_prec

Example:

consider a conversion from Decimal(5, 0) to Decimal(3, -3) with the number 99900 * 10^0 = 99.9 * 10^(3) which is then rounded to 100 * 10^3 = 100000
If we are starting with any number xxxxx, then and decrease the scale by 3 will have the following effect on the representation: [xxxxx] -> [xx] (+ 1 possibly). In the example plus one adds an additional digit, so the conversion to work, every output type needs to have at least 3 digits of precision.
A conversion to Decimal(2, -3) would not be possible.

Performance impact

The only cases affected are between decimal128 types, for increasing scale there is a considerable improvements that I measured of around 80%.
For decreasing the scale there was an improvement of around 25%.

cast decimal128 to decimal128 512                                  6.98      2.8±0.00µs        ? ?/sec    1.00    402.4±0.35ns        ? ?/sec
cast decimal128 to decimal128 512 lower precision                  1.01      3.0±0.00µs        ? ?/sec    1.00      2.9±0.01µs        ? ?/sec
cast decimal128 to decimal128 512 with lower scale (infallible)    1.31      3.3±0.00µs        ? ?/sec    1.00      2.5±0.00µs        ? ?/sec
cast decimal128 to decimal128 512 with same scale                  1.07     54.2±1.47ns        ? ?/sec    1.00     50.5±1.41ns        ? ?/sec
cast decimal128 to decimal256 512                                  1.00      6.1±0.01µs        ? ?/sec    1.00      6.1±0.01µs        ? ?/sec
cast decimal256 to decimal128 512                                  1.02     13.2±0.67µs        ? ?/sec    1.00     12.9±0.12µs        ? ?/sec
cast decimal256 to decimal256 512                                  1.01      6.1±0.02µs        ? ?/sec    1.00      6.1±0.03µs        ? ?/sec
cast decimal256 to decimal256 512 with same scale                  1.05     54.1±1.98ns        ? ?/sec    1.00     51.6±1.15ns        ? ?/sec

What changes are included in this PR?

I've added a new specialization for dealing with "safe" casts between the same decimal type both when reducing and increasing scale
A new benchmark for reducing scale

Are there any user-facing changes?

No, the behavior of the casts should stay the same.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 25, 2025
@alamb
Copy link
Contributor

alamb commented Jan 25, 2025

Thank @aweltsch -- I started the CI on this PR

@aweltsch aweltsch force-pushed the optimise-decimal-casting branch from 49c6ad1 to 50903f2 Compare February 5, 2025 07:15
@alamb
Copy link
Contributor

alamb commented Feb 7, 2025

FYI @andygrove @parthchandra and @himadripal who I think currently care quite a bit about Decimals

// than a possible precision loss (plus potential increase via rounding) every input number will fit into the output type
let is_infallible_cast = (input_precision as i8) - delta_scale < (output_precision as i8);

if is_infallible_cast {
Copy link
Contributor

@himadripal himadripal Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this check inside the convert_to_smaller_scale_decimal and only by-pass the validation part?
In that case conversion logic can be in one place.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the valuable feedback! Originally I thought we can only apply this to same type casts, but I think this can also be applied between decimal types since we're checking the "validity" condition based on the output precision, so from what I can tell we should be able to merge the functions. 👍

let f = |x: T::Native| x.mul_wrapping(mul);
Ok(array.unary(f))
} else {
convert_to_bigger_or_equal_scale_decimal::<T, T>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, only bypassing the validation based on this check inside convert_to_bigger_or_equal_scale_decimal?

fn test_decimal128_to_decimal128_coverage() {
let test_cases = [
// increase precision, increase scale, infallible
DecimalCastTestConfig {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: input_prec can be first param and then input_scale, for readability. Decimal(precision, scale)

@himadripal
Copy link
Contributor

himadripal commented Feb 7, 2025

thanks @aweltsch for the contribution, there are many existing tests for decimal conversion, should those be ported over to using this format. (i.e using DecimalCastTestConfig)?

Copy link
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. This is a nice improvement.

T::Native: DecimalCast + ArrowNativeTypeOp,
{
let delta_scale = input_scale - output_scale;
// if the reudction of the input number through scaling (dividing) is greater
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling: s/reudction/reduction
Also, can you add the example from the PR into this comment here. It makes things absolutely clear.

let delta_scale = output_scale - input_scale;
// if the gain in precision (digits) is greater than the multiplication due to scaling
// every number will fit into the output type
let is_infallible_cast = (input_precision as i8) + delta_scale <= (output_precision as i8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we already checked at this stage that the output precision is less than or equal to the max allowed for decimal 128?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I didn't find any place where this is validated before. We can easily add a check at the top of the function to handle these case 👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked into this a little bit more, and it seems that the precision and scale are checked when building an array using the validate_decimal_precision_and_scale function. My feeling is that the arrays should always be constructed correctly, but anyway I added a the validation at the top of the cast functions.

@aweltsch
Copy link
Author

aweltsch commented Feb 8, 2025

thanks @aweltsch for the contribution, there are many existing tests for decimal conversion, should those be ported over to using this format. (i.e using DecimalCastTestConfig)?

When introducing the struct I didn't see the necessity for the other test cases, because they only deal with one combination for input/output precision & scale. Since I've introduced it, I'm happy to convert the other cases to have a more consistent code base. Though I would prefer to do it in a different PR TBH.

@aweltsch
Copy link
Author

aweltsch commented Feb 9, 2025

thanks @aweltsch for the contribution, there are many existing tests for decimal conversion, should those be ported over to using this format. (i.e using DecimalCastTestConfig)?

When introducing the struct I didn't see the necessity for the other test cases, because they only deal with one combination for input/output precision & scale. Since I've introduced it, I'm happy to convert the other cases to have a more consistent code base. Though I would prefer to do it in a different PR TBH.

After having put the conversion logic in a central place it now applies also to conversions for Decimal256 -> Decimal128 (or the other way around). Hence I want to increase the test coverage a little bit more, but I'm still trying to figure out what a simple yet flexible solution would look like. Not sure if the result will be applicable to the rest of the test cases.

@@ -99,10 +100,24 @@ where
I::Native: DecimalCast + ArrowNativeTypeOp,
O::Native: DecimalCast + ArrowNativeTypeOp,
{
validate_decimal_precision_and_scale::<O>(output_precision, output_scale)?;
Copy link
Contributor

@himadripal himadripal Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this required validate_decimal_precision_and_scale? previously it wasn't there, hence confirming.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well TBH, this mostly a safeguard for what I mentioned here: #7021 (comment)
In the previous version of the code any problem with scale or precision was caught in the kernel via validate_decimal_precision / is_valid_decimal_precision.
For the newly added code this is not the case anymore, but there could be a problem if the output_precision exceeds the maximum allowed precision of the type.
To not change the existing behavior too much, I'll probably only check this inside the is_infallible_cast branch. I guess I will need to add some handling for the is_safe parameter there too.
This would not be needed if we can make sure that at this point the output_precision is always valid for the type. I investigated that, but I'm not 100% convinced of that. Maybe you have some opinions on that?

Copy link
Contributor

@himadripal himadripal Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the validation inside is_infallible_cast branch, sounds like a good idea, but that would mean then all 3 branches will have validation now. Is the performance benchmark is still better than previous code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation inside the is_infallible_cast branch is conceptually a little different to what is going on in the other branches. We can check the validity before running the kernel. It would look something like this

if is_infallible_cast {
        let precision_check =
            validate_decimal_precision_and_scale::<O>(output_precision, output_scale);
        if precision_check.is_err() {
            if cast_options.safe {
                return Ok(array.unary_opt(|_| None));
            }
            precision_check?
        }
        let g = |x: I::Native| f(x).unwrap(); // unwrapping is safe since the result is guaranteed
                                              // to fit into the target type
        array.unary(g)

so the performance on the benchmarks is not affected. It might be affected if the output_precision / output_scale are invalid, but even in that case I think it will be faster (though I didn't run a benchmark yet).

The main problem is really that it is potentially possible to call the functions convert_to_smaller_scale_decimal or convert_to_bigger_or_equal_scale_decimal with an invalid output_precision (e.g. precision 39 for Decimal128Type).
This would be an example:

 convert_to_smaller_scale_decimal::<Decimal256Type, Decimal128Type>(
                &array,
                50, // input scale: valid
                38, // input_precision: valid
                39, // output_precision: invalid, DECIMAL128_MAX_PRECISION = 38
                37, // output_scale: valid
                &CastOptions {
                    safe: true,
                    ..Default::default()
                },
            );

In the current version on the main branch this would return an array of all nulls, since the output_precision is invalid for every entry in the array.
From my POV there's already a precondition that has failed, when you try to convert to a decimal type with a precision outside of the allowed range, but to stay consistent with the current behaviour on the main branch, I think we need some special handling in the is_infallible_cast branch.

Copy link
Author

@aweltsch aweltsch Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just realized that there is a precision check at the bottom of the decimal casting functions (here and here, via this). My suggestion would be to remove pub(crate) from convert_to_smaller_scale_decimal or convert_to_bigger_or_equal_scale_decimal, add another test case for invalid output_precision for the conversion functions and not add an additional layer of validations on top.
@parthchandra since you originally raised the concern about the validation, do you have an opinion on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this. I agree with your previous observation that the array should have been already validated at the time of building so maybe it is unnecessary to add the additional validation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After all the discussion I've decided to keep the validation and to put it into the is_infallible_branch. To me it doesn't seem to make sense to run the kernel if the result can not succeed. Given that the same would happen if called through the cast_decimal_to_decimal function, I think we don't need any special handling for CastOptions.safe = true.

let f = |x| O::Native::from_decimal(x).and_then(|x| x.mul_checked(mul).ok());

Ok(if cast_options.safe {
Ok(if is_infallible_cast {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be add a comment like above, unwrapping is safe as the result will fit

@aweltsch aweltsch marked this pull request as ready for review February 13, 2025 05:15
let test_cases = [
// increase precision, increase scale, infallible
DecimalCastTestConfig {
input_prec: 5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have one test for input scale = 0 and output_scale = 0 with any precision.?

Copy link
Contributor

@himadripal himadripal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aweltsch aweltsch force-pushed the optimise-decimal-casting branch from a2f2b98 to 8f3e642 Compare February 13, 2025 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimise Decimal Casting
4 participants