Improve decimal parsing performance #3854

spebern · 2023-03-13T20:12:04Z

Which issue does this PR close?

Rationale for this change

Improve decimal parsing performance (~50%)

What changes are included in this PR?

Are there any user-facing changes?

tustvold · 2023-03-13T22:23:34Z

Thank you for this, will review fully tomorrow. Just some ideas that might make this even faster

Handle the sign, if any, outside the loop
Split the string on a decimal point, if any, so the two parts can be processed using the optimised lexical_core integer parsing routines

spebern · 2023-03-14T07:19:33Z

Thanks for the suggestions. I will have a look the next few days!

alamb · 2023-03-17T14:16:19Z

Thank you @spebern -- this looks quite exciting. Do you have any benchmarks you can share for this branch?

spebern · 2023-03-17T15:03:24Z

   Compiling arrow-cast v35.0.0 (/home/ben/workspace/arrow-rs/arrow-cast)
    Finished bench [optimized] target(s) in 12.64s
     Running benches/parse_decimal.rs (/home/ben/workspace/arrow-rs/target/release/deps/parse_decimal-ed9b69f81fe2bf18)
123.123                 time:   [104.52 ns 104.72 ns 104.95 ns]
                        change: [-52.561% -52.471% -52.383%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

123.1234                time:   [104.07 ns 104.20 ns 104.36 ns]
                        change: [-53.401% -53.291% -53.181%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe

123.1                   time:   [98.928 ns 99.199 ns 99.470 ns]
                        change: [-45.234% -45.059% -44.861%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

123                     time:   [91.060 ns 91.155 ns 91.230 ns]
                        change: [-39.558% -39.512% -39.470%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  2 (2.00%) high mild

-123.123                time:   [105.71 ns 105.81 ns 105.91 ns]
                        change: [-52.052% -51.944% -51.796%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

-123.1234               time:   [106.38 ns 106.46 ns 106.53 ns]
                        change: [-51.522% -51.477% -51.428%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

-123.1                  time:   [101.36 ns 101.52 ns 101.70 ns]
                        change: [-44.430% -44.366% -44.301%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  2 (2.00%) low severe
  14 (14.00%) low mild
  1 (1.00%) high mild

-123                    time:   [93.470 ns 93.481 ns 93.493 ns]
                        change: [-38.446% -38.425% -38.404%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

0.0000123               time:   [76.150 ns 76.335 ns 76.565 ns]
                        change: [-50.658% -50.451% -50.165%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

12.                     time:   [75.427 ns 75.511 ns 75.606 ns]
                        change: [-38.211% -38.128% -38.050%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  7 (7.00%) high mild
  8 (8.00%) high severe

-12.                    time:   [77.339 ns 77.507 ns 77.684 ns]
                        change: [-36.242% -36.182% -36.113%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

00.1                    time:   [84.054 ns 84.180 ns 84.291 ns]
                        change: [-42.095% -42.014% -41.947%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 23 outliers among 100 measurements (23.00%)
  12 (12.00%) low severe
  4 (4.00%) low mild
  7 (7.00%) high mild

-00.1                   time:   [85.438 ns 85.652 ns 85.869 ns]
                        change: [-43.774% -43.677% -43.580%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 32 outliers among 100 measurements (32.00%)
  8 (8.00%) low severe
  5 (5.00%) low mild
  5 (5.00%) high mild
  14 (14.00%) high severe

12345678912345678.1234  time:   [320.02 ns 320.41 ns 320.75 ns]
                        change: [-52.900% -52.848% -52.797%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) low severe
  8 (8.00%) low mild
  1 (1.00%) high mild

-12345678912345678.1234 time:   [321.51 ns 321.92 ns 322.31 ns]
                        change: [-52.910% -52.852% -52.796%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high severe

99999999999999999.999   time:   [321.10 ns 321.52 ns 321.86 ns]
                        change: [-53.848% -53.781% -53.718%] (p = 0.00 < 0.05)
                        Performance has improved.

-99999999999999999.999  time:   [321.45 ns 321.78 ns 322.10 ns]
                        change: [-54.293% -54.246% -54.201%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) low mild
  1 (1.00%) high mild

.123                    time:   [59.960 ns 60.020 ns 60.086 ns]
                        change: [-50.726% -50.672% -50.624%] (p = 0.00 < 0.05)
                        Performance has improved.

-.123                   time:   [60.894 ns 60.979 ns 61.080 ns]
                        change: [-50.659% -50.555% -50.477%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  3 (3.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  9 (9.00%) high severe

123.                    time:   [90.291 ns 90.504 ns 90.719 ns]
                        change: [-41.874% -41.739% -41.605%] (p = 0.00 < 0.05)
                        Performance has improved.

-123.                   time:   [92.033 ns 92.224 ns 92.414 ns]
                        change: [-40.311% -40.201% -40.085%] (p = 0.00 < 0.05)
                        Performance has improved.

     Running benches/parse_timestamp.rs (/home/ben/workspace/arrow-rs/target/release/deps/parse_timestamp-4e4733f613306eeb)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

spebern · 2023-03-17T15:09:45Z

I had a look at what @tustvold suggested and in some cases we'd get another ~40% of performance. However, I only tested this for the 128 bit types, because lexical doesn't support 256 bit. I haven't fully grasped yet how to implement lexical support for the 256 bit types here yet, but basic things such as shift operations are missing, so it will be a little more work.

What do you think of replacing the usage of BigInt by https://docs.rs/ethnum/latest/ethnum/ ? ethnum is also used by arrow2 and has a more complete implementation when it comes to basic integer operations such as bit shifting.

tustvold · 2023-03-17T15:28:24Z

basic integer operations such as bit shifting

These should be relatively straightforward to implement, I will see what I can do

we'd get another ~40% of performance

❤️

spebern · 2023-03-17T15:34:07Z

Does it make sense to have a look at ethnum as a replacement? arrow2's i256 is a simple wrapper around it: https://github.com/jorgecarleitao/arrow2/blob/db87f71f3f673b240de80d5a7592d45c0beae805/src/types/native.rs#LL519-L519C35

Would probably save some work, since all the ops from the rust std lib seem to be implemented there.

I will have a look at lexical support this weekend, not quite sure for now how much is missing.

tustvold · 2023-03-17T15:36:46Z

Does it make sense to have a look at ethnum as a replacement

I think if we can avoid a dependency, especially one that doesn't have a clear maintenance story (it is a personal crate), that is better. Let me see what it looks like, I think it should be straightforward.

alamb · 2023-03-17T16:44:33Z

I think if we can avoid a dependency, especially one that doesn't have a clear maintenance story (it is a personal crate), that is better. Let me see what it looks like, I think it should be straightforward.

Related #3884

alamb · 2023-03-20T19:43:09Z

@spebern would it be possible to update this PR to use the operations added in #3884 rather than a new crate?

spebern · 2023-03-20T21:19:37Z

I had a closer look at lexical and to me, it seems that not enough of the
functions are public to use the optimized methods.
Sorry, I was a little too optimistic, that implementing a couple of
trait methods might make it possible to use lexical's underlying machinery.
The macros that derive the impls are almost all private.

I opened Alexhuszagh/rust-lexical#93, because as
it is right now, it doesn't seem to be feasible for me to implement it.

For now, I see three options for continuing here:

Merge as is
Implement FromLexical for i256 without the actual optimizations from lexical

Here some benchmark results for Decimal128 when using lexical. Note, that
not everything is getting faster, but more digits definitely profit.

123.123                 time:   [17.647 ns 17.713 ns 17.814 ns]
                        change: [+11.220% +11.737% +12.425%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

123.1234                time:   [17.927 ns 17.977 ns 18.048 ns]
                        change: [+8.2108% +8.5561% +8.8512%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

123.1                   time:   [23.494 ns 23.526 ns 23.560 ns]
                        change: [+26.116% +26.332% +26.549%] (p = 0.00 < 0.05)
                        Performance has regressed.

123                     time:   [12.598 ns 12.611 ns 12.624 ns]
                        change: [-19.372% -19.212% -19.064%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

-123.123                time:   [17.898 ns 17.917 ns 17.936 ns]
                        change: [+2.3288% +2.4573% +2.6023%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

-123.1234               time:   [18.256 ns 18.277 ns 18.299 ns]
                        change: [+0.2064% +0.5108% +0.7597%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

-123.1                  time:   [23.471 ns 23.511 ns 23.552 ns]
                        change: [+16.213% +16.474% +16.731%] (p = 0.00 < 0.05)
                        Performance has regressed.

-123                    time:   [13.065 ns 13.080 ns 13.095 ns]
                        change: [-26.644% -26.545% -26.438%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

0.0000123               time:   [14.956 ns 14.975 ns 14.994 ns]
                        change: [+4.5860% +4.7599% +4.9359%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

12.                     time:   [13.072 ns 13.090 ns 13.109 ns]
                        change: [-13.966% -13.509% -13.053%] (p = 0.00 < 0.05)
                        Performance has improved.

-12.                    time:   [13.506 ns 13.525 ns 13.545 ns]
                        change: [-14.013% -13.878% -13.746%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

00.1                    time:   [20.984 ns 21.023 ns 21.062 ns]
                        change: [+26.903% +27.158% +27.407%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

-00.1                   time:   [21.674 ns 21.692 ns 21.712 ns]
                        change: [+21.888% +22.153% +22.414%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low severe
  4 (4.00%) high mild

12345678912345678.1234  time:   [25.908 ns 25.932 ns 25.957 ns]
                        change: [-42.285% -42.218% -42.149%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  3 (3.00%) low mild
  5 (5.00%) high mild
  7 (7.00%) high severe

-12345678912345678.1234 time:   [25.995 ns 26.030 ns 26.064 ns]
                        change: [-44.608% -44.413% -44.258%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

99999999999999999.999   time:   [24.779 ns 24.802 ns 24.828 ns]
                        change: [-44.160% -44.036% -43.927%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  8 (8.00%) high mild
  1 (1.00%) high severe

-99999999999999999.999  time:   [21.675 ns 21.707 ns 21.741 ns]
                        change: [-52.810% -52.745% -52.682%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

.123                    time:   [13.940 ns 13.956 ns 13.973 ns]
                        change: [+19.897% +20.075% +20.245%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

-.123                   time:   [14.584 ns 14.598 ns 14.613 ns]
                        change: [+16.487% +16.619% +16.761%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild

123.                    time:   [13.525 ns 13.546 ns 13.569 ns]
                        change: [-17.893% -17.772% -17.644%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

-123.                   time:   [14.290 ns 14.319 ns 14.346 ns]
                        change: [-17.726% -17.552% -17.384%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) low mild
  1 (1.00%) high mild

tustvold

I think as is this PR represents a significant performance improvement, we can always further improve the performance in subsequent PRs. Thank you for this

tustvold · 2023-03-21T10:33:51Z

arrow-cast/src/parse.rs

+                        "parse decimal overflow".to_string(),
+                    ));
+                }
+                result = result.mul_checked(base)?;


It occurs to me that the precision check above should mean neither of these can overflow

github-actions bot added the arrow Changes to the arrow crate label Mar 13, 2023

Improve decimal parsing performance

e377d8b

spebern force-pushed the faster-decimal-parsing branch from f094bb5 to e377d8b Compare March 13, 2023 20:14

alamb requested a review from liukun4515 March 17, 2023 14:15

Merge branch 'master' into faster-decimal-parsing

eaae493

tustvold approved these changes Mar 21, 2023

View reviewed changes

tustvold reviewed Mar 21, 2023

View reviewed changes

tustvold merged commit f1d5a37 into apache:master Mar 21, 2023

spebern mentioned this pull request Sep 11, 2024

[QUESTION] support custom types (i256) Alexhuszagh/rust-lexical#93

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve decimal parsing performance #3854

Improve decimal parsing performance #3854

spebern commented Mar 13, 2023 •

edited

Loading

tustvold commented Mar 13, 2023

spebern commented Mar 14, 2023

alamb commented Mar 17, 2023

spebern commented Mar 17, 2023

spebern commented Mar 17, 2023

tustvold commented Mar 17, 2023 •

edited

Loading

spebern commented Mar 17, 2023

tustvold commented Mar 17, 2023

alamb commented Mar 17, 2023

alamb commented Mar 20, 2023

spebern commented Mar 20, 2023

tustvold left a comment

tustvold Mar 21, 2023 •

edited

Loading

Improve decimal parsing performance #3854

Improve decimal parsing performance #3854

Conversation

spebern commented Mar 13, 2023 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

tustvold commented Mar 13, 2023

spebern commented Mar 14, 2023

alamb commented Mar 17, 2023

spebern commented Mar 17, 2023

spebern commented Mar 17, 2023

tustvold commented Mar 17, 2023 • edited Loading

spebern commented Mar 17, 2023

tustvold commented Mar 17, 2023

alamb commented Mar 17, 2023

alamb commented Mar 20, 2023

spebern commented Mar 20, 2023

tustvold left a comment

Choose a reason for hiding this comment

tustvold Mar 21, 2023 • edited Loading

Choose a reason for hiding this comment

spebern commented Mar 13, 2023 •

edited

Loading

tustvold commented Mar 17, 2023 •

edited

Loading

tustvold Mar 21, 2023 •

edited

Loading