Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve decimal parsing performance #3854

Merged
merged 2 commits into from
Mar 21, 2023

Conversation

spebern
Copy link
Contributor

@spebern spebern commented Mar 13, 2023

Which issue does this PR close?

Rationale for this change

Improve decimal parsing performance (~50%)

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 13, 2023
@spebern spebern force-pushed the faster-decimal-parsing branch from f094bb5 to e377d8b Compare March 13, 2023 20:14
@tustvold
Copy link
Contributor

Thank you for this, will review fully tomorrow. Just some ideas that might make this even faster

  • Handle the sign, if any, outside the loop
  • Split the string on a decimal point, if any, so the two parts can be processed using the optimised lexical_core integer parsing routines

@spebern
Copy link
Contributor Author

spebern commented Mar 14, 2023

Thanks for the suggestions. I will have a look the next few days!

@alamb alamb requested a review from liukun4515 March 17, 2023 14:15
@alamb
Copy link
Contributor

alamb commented Mar 17, 2023

Thank you @spebern -- this looks quite exciting. Do you have any benchmarks you can share for this branch?

@spebern
Copy link
Contributor Author

spebern commented Mar 17, 2023

   Compiling arrow-cast v35.0.0 (/home/ben/workspace/arrow-rs/arrow-cast)
    Finished bench [optimized] target(s) in 12.64s
     Running benches/parse_decimal.rs (/home/ben/workspace/arrow-rs/target/release/deps/parse_decimal-ed9b69f81fe2bf18)
123.123                 time:   [104.52 ns 104.72 ns 104.95 ns]
                        change: [-52.561% -52.471% -52.383%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

123.1234                time:   [104.07 ns 104.20 ns 104.36 ns]
                        change: [-53.401% -53.291% -53.181%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe

123.1                   time:   [98.928 ns 99.199 ns 99.470 ns]
                        change: [-45.234% -45.059% -44.861%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

123                     time:   [91.060 ns 91.155 ns 91.230 ns]
                        change: [-39.558% -39.512% -39.470%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  2 (2.00%) high mild

-123.123                time:   [105.71 ns 105.81 ns 105.91 ns]
                        change: [-52.052% -51.944% -51.796%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

-123.1234               time:   [106.38 ns 106.46 ns 106.53 ns]
                        change: [-51.522% -51.477% -51.428%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

-123.1                  time:   [101.36 ns 101.52 ns 101.70 ns]
                        change: [-44.430% -44.366% -44.301%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  2 (2.00%) low severe
  14 (14.00%) low mild
  1 (1.00%) high mild

-123                    time:   [93.470 ns 93.481 ns 93.493 ns]
                        change: [-38.446% -38.425% -38.404%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

0.0000123               time:   [76.150 ns 76.335 ns 76.565 ns]
                        change: [-50.658% -50.451% -50.165%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

12.                     time:   [75.427 ns 75.511 ns 75.606 ns]
                        change: [-38.211% -38.128% -38.050%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  7 (7.00%) high mild
  8 (8.00%) high severe

-12.                    time:   [77.339 ns 77.507 ns 77.684 ns]
                        change: [-36.242% -36.182% -36.113%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

00.1                    time:   [84.054 ns 84.180 ns 84.291 ns]
                        change: [-42.095% -42.014% -41.947%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 23 outliers among 100 measurements (23.00%)
  12 (12.00%) low severe
  4 (4.00%) low mild
  7 (7.00%) high mild

-00.1                   time:   [85.438 ns 85.652 ns 85.869 ns]
                        change: [-43.774% -43.677% -43.580%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 32 outliers among 100 measurements (32.00%)
  8 (8.00%) low severe
  5 (5.00%) low mild
  5 (5.00%) high mild
  14 (14.00%) high severe

12345678912345678.1234  time:   [320.02 ns 320.41 ns 320.75 ns]
                        change: [-52.900% -52.848% -52.797%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) low severe
  8 (8.00%) low mild
  1 (1.00%) high mild

-12345678912345678.1234 time:   [321.51 ns 321.92 ns 322.31 ns]
                        change: [-52.910% -52.852% -52.796%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high severe

99999999999999999.999   time:   [321.10 ns 321.52 ns 321.86 ns]
                        change: [-53.848% -53.781% -53.718%] (p = 0.00 < 0.05)
                        Performance has improved.

-99999999999999999.999  time:   [321.45 ns 321.78 ns 322.10 ns]
                        change: [-54.293% -54.246% -54.201%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) low mild
  1 (1.00%) high mild

.123                    time:   [59.960 ns 60.020 ns 60.086 ns]
                        change: [-50.726% -50.672% -50.624%] (p = 0.00 < 0.05)
                        Performance has improved.

-.123                   time:   [60.894 ns 60.979 ns 61.080 ns]
                        change: [-50.659% -50.555% -50.477%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  3 (3.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  9 (9.00%) high severe

123.                    time:   [90.291 ns 90.504 ns 90.719 ns]
                        change: [-41.874% -41.739% -41.605%] (p = 0.00 < 0.05)
                        Performance has improved.

-123.                   time:   [92.033 ns 92.224 ns 92.414 ns]
                        change: [-40.311% -40.201% -40.085%] (p = 0.00 < 0.05)
                        Performance has improved.

     Running benches/parse_timestamp.rs (/home/ben/workspace/arrow-rs/target/release/deps/parse_timestamp-4e4733f613306eeb)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

@spebern
Copy link
Contributor Author

spebern commented Mar 17, 2023

I had a look at what @tustvold suggested and in some cases we'd get another ~40% of performance. However, I only tested this for the 128 bit types, because lexical doesn't support 256 bit. I haven't fully grasped yet how to implement lexical support for the 256 bit types here yet, but basic things such as shift operations are missing, so it will be a little more work.

What do you think of replacing the usage of BigInt by https://docs.rs/ethnum/latest/ethnum/ ? ethnum is also used by arrow2 and has a more complete implementation when it comes to basic integer operations such as bit shifting.

@tustvold
Copy link
Contributor

tustvold commented Mar 17, 2023

basic integer operations such as bit shifting

These should be relatively straightforward to implement, I will see what I can do

we'd get another ~40% of performance

❤️

@spebern
Copy link
Contributor Author

spebern commented Mar 17, 2023

Does it make sense to have a look at ethnum as a replacement? arrow2's i256 is a simple wrapper around it: https://github.com/jorgecarleitao/arrow2/blob/db87f71f3f673b240de80d5a7592d45c0beae805/src/types/native.rs#LL519-L519C35

Would probably save some work, since all the ops from the rust std lib seem to be implemented there.

I will have a look at lexical support this weekend, not quite sure for now how much is missing.

@tustvold
Copy link
Contributor

Does it make sense to have a look at ethnum as a replacement

I think if we can avoid a dependency, especially one that doesn't have a clear maintenance story (it is a personal crate), that is better. Let me see what it looks like, I think it should be straightforward.

@alamb
Copy link
Contributor

alamb commented Mar 17, 2023

I think if we can avoid a dependency, especially one that doesn't have a clear maintenance story (it is a personal crate), that is better. Let me see what it looks like, I think it should be straightforward.

Related #3884

@alamb
Copy link
Contributor

alamb commented Mar 20, 2023

@spebern would it be possible to update this PR to use the operations added in #3884 rather than a new crate?

@spebern
Copy link
Contributor Author

spebern commented Mar 20, 2023

I had a closer look at lexical and to me, it seems that not enough of the
functions are public to use the optimized methods.
Sorry, I was a little too optimistic, that implementing a couple of
trait methods might make it possible to use lexical's underlying machinery.
The macros that derive the impls are almost all private.

I opened Alexhuszagh/rust-lexical#93, because as
it is right now, it doesn't seem to be feasible for me to implement it.

For now, I see three options for continuing here:

  1. Merge as is

  2. Implement FromLexical for i256 without the actual optimizations from lexical

    Here some benchmark results for Decimal128 when using lexical. Note, that
    not everything is getting faster, but more digits definitely profit.

123.123                 time:   [17.647 ns 17.713 ns 17.814 ns]
                        change: [+11.220% +11.737% +12.425%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

123.1234                time:   [17.927 ns 17.977 ns 18.048 ns]
                        change: [+8.2108% +8.5561% +8.8512%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

123.1                   time:   [23.494 ns 23.526 ns 23.560 ns]
                        change: [+26.116% +26.332% +26.549%] (p = 0.00 < 0.05)
                        Performance has regressed.

123                     time:   [12.598 ns 12.611 ns 12.624 ns]
                        change: [-19.372% -19.212% -19.064%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

-123.123                time:   [17.898 ns 17.917 ns 17.936 ns]
                        change: [+2.3288% +2.4573% +2.6023%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

-123.1234               time:   [18.256 ns 18.277 ns 18.299 ns]
                        change: [+0.2064% +0.5108% +0.7597%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

-123.1                  time:   [23.471 ns 23.511 ns 23.552 ns]
                        change: [+16.213% +16.474% +16.731%] (p = 0.00 < 0.05)
                        Performance has regressed.

-123                    time:   [13.065 ns 13.080 ns 13.095 ns]
                        change: [-26.644% -26.545% -26.438%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

0.0000123               time:   [14.956 ns 14.975 ns 14.994 ns]
                        change: [+4.5860% +4.7599% +4.9359%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

12.                     time:   [13.072 ns 13.090 ns 13.109 ns]
                        change: [-13.966% -13.509% -13.053%] (p = 0.00 < 0.05)
                        Performance has improved.

-12.                    time:   [13.506 ns 13.525 ns 13.545 ns]
                        change: [-14.013% -13.878% -13.746%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

00.1                    time:   [20.984 ns 21.023 ns 21.062 ns]
                        change: [+26.903% +27.158% +27.407%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

-00.1                   time:   [21.674 ns 21.692 ns 21.712 ns]
                        change: [+21.888% +22.153% +22.414%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low severe
  4 (4.00%) high mild

12345678912345678.1234  time:   [25.908 ns 25.932 ns 25.957 ns]
                        change: [-42.285% -42.218% -42.149%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  3 (3.00%) low mild
  5 (5.00%) high mild
  7 (7.00%) high severe

-12345678912345678.1234 time:   [25.995 ns 26.030 ns 26.064 ns]
                        change: [-44.608% -44.413% -44.258%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

99999999999999999.999   time:   [24.779 ns 24.802 ns 24.828 ns]
                        change: [-44.160% -44.036% -43.927%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  8 (8.00%) high mild
  1 (1.00%) high severe

-99999999999999999.999  time:   [21.675 ns 21.707 ns 21.741 ns]
                        change: [-52.810% -52.745% -52.682%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

.123                    time:   [13.940 ns 13.956 ns 13.973 ns]
                        change: [+19.897% +20.075% +20.245%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

-.123                   time:   [14.584 ns 14.598 ns 14.613 ns]
                        change: [+16.487% +16.619% +16.761%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild

123.                    time:   [13.525 ns 13.546 ns 13.569 ns]
                        change: [-17.893% -17.772% -17.644%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

-123.                   time:   [14.290 ns 14.319 ns 14.346 ns]
                        change: [-17.726% -17.552% -17.384%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) low mild
  1 (1.00%) high mild

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as is this PR represents a significant performance improvement, we can always further improve the performance in subsequent PRs. Thank you for this

"parse decimal overflow".to_string(),
));
}
result = result.mul_checked(base)?;
Copy link
Contributor

@tustvold tustvold Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that the precision check above should mean neither of these can overflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants