Skip to content

Commit

Permalink
Fix integer parsing correctness issues.
Browse files Browse the repository at this point in the history
This avoids using an intermediary unsigned type and then post-overflow validation, which avoids issues with overflowing checks wrapping to above our min value. This does not meaningfully affect performance, despite simplifying the type checks dramatically. The float and integer parsing benchmarks are comparable to before, with the integer parsing being slower than core for simple integers but much faster for larger/random integers due to multi-digit parsing.

- Closes #91
  • Loading branch information
Alexhuszagh committed Sep 9, 2024
1 parent 05cc142 commit fb3e059
Show file tree
Hide file tree
Showing 12 changed files with 165 additions and 346 deletions.
182 changes: 0 additions & 182 deletions lexical-parse-integer/docs/Algorithm.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,185 +17,3 @@ Therefore, only 1 optimization for parsing multiple digits was used for each typ
Finally, for 32-bit and 64-bit signed integers, we use no multiple-digit optimizations, since they provide **no** benefit for 32-bit integers in any cases, and only ~23% benefit for large 64-bit integers. However, for simple integers, due to the increased branching, they induce a performance penalty of ~50%.

In addition, rather than have separate branches for positive and negative numbers, both are parsed as unsigned integers, and then converted to the signed variant, after overflowing checking.

**Overflow Checking**

Rather than do checked multiplication and additions in each loop, which increases the amount of branching, we can check after if numeric overflow has occurred by checking the number of parsed digits, and the resulting value.

Given the following unoptimized code:

```rust,ignore
pub fn parse(bytes: &[u8]) -> Result<u64, ()> {
let mut value: u64 = 0;
let mut iter = bytes.iter();
while let Some(&c) = iter.next() {
let digit = match (c as char).to_digit(10) {
Some(v) => v,
None => return Err(()),
};
value = match value.checked_mul(10) {
Some(v) => v,
None => return Err(()),
};
value = match value.checked_add(digit as u64) {
Some(v) => v,
None => return Err(()),
};
}
Ok(value)
}
```

This translates to the following assembly:

```asm
example::parse:
xor r11d, r11d
mov r10d, 10
xor eax, eax
mov r8d, 1
.LBB0_1:
mov r9, rax
cmp rsi, r11
je .LBB0_2
movzx ecx, byte ptr [rdi + r11]
add ecx, -48
cmp ecx, 10
jae .LBB0_6
mov rax, r9
mul r10
jo .LBB0_6
mov ecx, ecx
add r11, 1
add rax, rcx
jae .LBB0_1
.LBB0_6:
mov rax, r8
mov rdx, r9
ret
.LBB0_2:
xor r8d, r8d
mov rax, r8
mov rdx, r9
ret
```

We optimize it to the following code:

```rust,ignore
pub fn parse(bytes: &[u8]) -> Result<u64, ()> {
let mut value: u64 = 0;
let mut iter = bytes.iter();
while let Some(&c) = iter.next() {
let digit = match (c as char).to_digit(10) {
Some(v) => v,
None => return Err(()),
};
value = value.wrapping_mul(10);
value = value.wrapping_mul(digit as u64);
}
Ok(value)
}
```

Which produces the following assembly:

```asm
example::parse:
xor eax, eax
xor ecx, ecx
.LBB0_1:
cmp rsi, rcx
je .LBB0_4
movzx edx, byte ptr [rdi + rcx]
add edx, -48
add rcx, 1
cmp edx, 10
jb .LBB0_1
mov eax, 1
.LBB0_4:
xor edx, edx
ret
```

This is much more efficient, however, there is one major limitation: we cannot know if numerical overflow has occurred, and must do it after the fact. We have numerical overflow on two cases: we parsed more digits than we theoretically could, or we parsed the same number as the maximum, but the number wrapped. Since the number wrapping will always produce a smaller value than the minimum value for that number of digits, this is a simple comparison.

For unsigned integers, this is quite easy: we merely need to know the maximum number of digits that can be parsed without guaranteeing numerical overflow, and the number of digits that were parsed.

```rust,ignore
// For example, max could be 20 for u64.
let count = ...; // Actual number of digits parsed.
let max = ...; // Maximum number of digits that could be parsed.
// Calculate the minimum value from processing `max` digits.
let min_value = 10u64.pow(max as u32 - 1);
// If we've processed more than the max digits, or if the value wrapped,
// we have overflow.
let is_overflow = count > max || (count == max && value < min_value);
```

For signed integers, it's slightly more complicated, but still quite easy:

```rust,ignore
// For example, max could be 18 for i64.
let count = ...; // Actual number of digits parsed.
let max = ...; // Maximum number of digits that could be parsed.
let is_negative = ...; // If the value is less than 0.
// Calculate the minimum value from processing `max` digits.
let min_value = 10u64.pow(max as u32 - 1);
let max_value = i64::MAX as u64 + 1;
let is_overflow = count > max
|| (count == max && (
value < min_value
|| value > max_value
|| (!is_negative && value == max_value)
));
```

All of the powers and constant generation is resolved at compile-time, producing efficient routines. For example, for `u64`, the following rust code:

```rust,ignore
pub fn is_overflow(value: u64, count: usize) -> bool {
let max: usize = 20;
let min_value = 10u64.pow(max as u32 - 1);
count > max || (count == max && value < min_value)
}
```

... produces the following assembly:

```asm
example::is_overflow:
cmp rsi, 20
seta cl
sete dl
movabs rax, -8446744073709551616
cmp rdi, rax
setb al
and al, dl
or al, cl
ret
```

Not bad at all.

**Compact**

For our compact implementation, prioritizing code size at the cost of performance, we use a naive algorithm that parses 1 digit at a time, without any additional optimizations. This algorithm is trivial to verify, and is effectively analogous to the following code:

```rust,ignore
let mut value = 0;
while let Some(&c) = iter.next() {
let digit = match (c as char).to_digit(radix) {
Some(v) => v,
None => return Err(...),
};
value = match value.checked_mul(radix) {
Some(v) => v,
None => return Err(...),
};
value = match value.checked_add(digit) {
Some(v) => v,
None => return Err(...),
};
}
```
59 changes: 34 additions & 25 deletions lexical-parse-integer/src/algorithm.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,10 @@
#![cfg(not(feature = "compact"))]
#![doc(hidden)]

use crate::shared::is_overflow;
use lexical_util::digit::char_to_digit_const;
use lexical_util::format::NumberFormat;
use lexical_util::iterator::{AsBytes, BytesIter};
use lexical_util::num::{as_cast, Integer, UnsignedInteger};
use lexical_util::num::{as_cast, Integer};
use lexical_util::result::Result;
use lexical_util::step::min_step;

Expand All @@ -39,7 +38,9 @@ macro_rules! parse_8digits {
$value:ident,
$iter:ident,
$format:ident,
$t:ident
$t:ident,
$overflow:ident,
$op:ident
) => {{
let radix: $t = as_cast(NumberFormat::<{ $format }>::MANTISSA_RADIX);
let radix2: $t = radix.wrapping_mul(radix);
Expand All @@ -48,8 +49,13 @@ macro_rules! parse_8digits {

// Try our fast, 8-digit at a time optimizations.
while let Some(val8) = try_parse_8digits::<$t, _, $format>(&mut $iter) {
$value = $value.wrapping_mul(radix8);
$value = $value.wrapping_add(val8);
let optvalue = $value.checked_mul(radix8).and_then(|x| x.$op(val8));
if let Some(unwrapped) = optvalue {
$value = unwrapped;
} else {
$overflow = true;
break;
}
}
}};
}
Expand All @@ -65,16 +71,23 @@ macro_rules! parse_4digits {
$value:ident,
$iter:ident,
$format:ident,
$t:ident
$t:ident,
$overflow:ident,
$op:ident
) => {{
let radix: $t = as_cast(NumberFormat::<{ $format }>::MANTISSA_RADIX);
let radix2: $t = radix.wrapping_mul(radix);
let radix4: $t = radix2.wrapping_mul(radix2);

// Try our fast, 4-digit at a time optimizations.
while let Some(val4) = try_parse_4digits::<$t, _, $format>(&mut $iter) {
$value = $value.wrapping_mul(radix4);
$value = $value.wrapping_add(val4);
let optvalue = $value.checked_mul(radix4).and_then(|x| x.$op(val4));
if let Some(unwrapped) = optvalue {
$value = unwrapped;
} else {
$overflow = true;
break;
}
}
}};
}
Expand All @@ -90,8 +103,9 @@ macro_rules! parse_digits {
$is_negative:ident,
$start_index:ident,
$t:ident,
$u:ident,
$invalid_digit:ident
$invalid_digit:ident,
$overflow:ident,
$op:ident
) => {{
// WARNING:
// Performance is heavily dependent on the amount of branching.
Expand Down Expand Up @@ -120,42 +134,37 @@ macro_rules! parse_digits {
// Optimizations for reading 8-digits at a time.
// Makes no sense to do 8 digits at a time for 32-bit values,
// since it can only hold 8 digits for base 10.
if <$t>::BITS == 128 && can_try_parse_multidigits!($iter, radix) {
parse_8digits!($value, $iter, $format, $u);
}
if <$t>::BITS == 64 && can_try_parse_multidigits!($iter, radix) && !<$t>::IS_SIGNED {
parse_8digits!($value, $iter, $format, $u);
// NOTE: These values were determined as optimization
if (<$t>::BITS == 128 || <$t>::BITS == 64) && can_try_parse_multidigits!($iter, radix) && $iter.length() >= 8 {
parse_8digits!($value, $iter, $format, $t, $overflow, $op);
}

// Optimizations for reading 4-digits at a time.
// 36^4 is larger than a 16-bit integer. Likewise, 10^4 is almost
// the limit of u16, so it's not worth it.
if <$t>::BITS == 32 && can_try_parse_multidigits!($iter, radix) && !<$t>::IS_SIGNED {
parse_4digits!($value, $iter, $format, $u);
if <$t>::BITS == 32 && can_try_parse_multidigits!($iter, radix) && $iter.length() >= 4 && !$overflow {
parse_4digits!($value, $iter, $format, $t, $overflow, $op);
}

parse_1digit!($value, $iter, $format, $is_negative, $start_index, $t, $u, $invalid_digit)
parse_1digit!($value, $iter, $format, $is_negative, $start_index, $t, $invalid_digit, $overflow, $op);
}};
}

/// Algorithm for the complete parser.
#[inline(always)]
pub fn algorithm_complete<T, Unsigned, const FORMAT: u128>(bytes: &[u8]) -> Result<T>
pub fn algorithm_complete<T, const FORMAT: u128>(bytes: &[u8]) -> Result<T>
where
T: Integer,
Unsigned: UnsignedInteger,
{
algorithm!(bytes, FORMAT, T, Unsigned, parse_digits, invalid_digit_complete, into_ok_complete)
algorithm!(bytes, FORMAT, T, parse_digits, invalid_digit_complete, into_ok_complete)
}

/// Algorithm for the partial parser.
#[inline(always)]
pub fn algorithm_partial<T, Unsigned, const FORMAT: u128>(bytes: &[u8]) -> Result<(T, usize)>
pub fn algorithm_partial<T, const FORMAT: u128>(bytes: &[u8]) -> Result<(T, usize)>
where
T: Integer,
Unsigned: UnsignedInteger,
{
algorithm!(bytes, FORMAT, T, Unsigned, parse_digits, invalid_digit_partial, into_ok_partial)
algorithm!(bytes, FORMAT, T, parse_digits, invalid_digit_partial, into_ok_partial)
}

// DIGIT OPTIMIZATIONS
Expand Down
8 changes: 4 additions & 4 deletions lexical-parse-integer/src/api.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ macro_rules! integer_from_lexical {
#[cfg_attr(not(feature = "compact"), inline)]
fn from_lexical(bytes: &[u8]) -> lexical_util::result::Result<Self>
{
Self::parse_complete::<$unsigned, STANDARD>(bytes)
Self::parse_complete::<STANDARD>(bytes)
}

$(#[$meta:meta])?
Expand All @@ -29,7 +29,7 @@ macro_rules! integer_from_lexical {
bytes: &[u8],
) -> lexical_util::result::Result<(Self, usize)>
{
Self::parse_partial::<$unsigned, STANDARD>(bytes)
Self::parse_partial::<STANDARD>(bytes)
}
}

Expand All @@ -47,7 +47,7 @@ macro_rules! integer_from_lexical {
if !format.is_valid() {
return Err(format.error());
}
Self::parse_complete::<$unsigned, FORMAT>(bytes)
Self::parse_complete::<FORMAT>(bytes)
}

$(#[$meta:meta])?
Expand All @@ -61,7 +61,7 @@ macro_rules! integer_from_lexical {
if !format.is_valid() {
return Err(format.error());
}
Self::parse_partial::<$unsigned, FORMAT>(bytes)
Self::parse_partial::<FORMAT>(bytes)
}
}
)*)
Expand Down
13 changes: 5 additions & 8 deletions lexical-parse-integer/src/compact.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,28 +5,25 @@
#![cfg(feature = "compact")]
#![doc(hidden)]

use crate::shared::is_overflow;
use lexical_util::digit::char_to_digit_const;
use lexical_util::format::NumberFormat;
use lexical_util::iterator::{AsBytes, BytesIter};
use lexical_util::num::{as_cast, Integer, UnsignedInteger};
use lexical_util::num::{as_cast, Integer};
use lexical_util::result::Result;
use lexical_util::step::min_step;

/// Algorithm for the complete parser.
pub fn algorithm_complete<T, Unsigned, const FORMAT: u128>(bytes: &[u8]) -> Result<T>
pub fn algorithm_complete<T, const FORMAT: u128>(bytes: &[u8]) -> Result<T>
where
T: Integer,
Unsigned: UnsignedInteger,
{
algorithm!(bytes, FORMAT, T, Unsigned, parse_1digit, invalid_digit_complete, into_ok_complete)
algorithm!(bytes, FORMAT, T, parse_1digit, invalid_digit_complete, into_ok_complete)
}

/// Algorithm for the partial parser.
pub fn algorithm_partial<T, Unsigned, const FORMAT: u128>(bytes: &[u8]) -> Result<(T, usize)>
pub fn algorithm_partial<T, const FORMAT: u128>(bytes: &[u8]) -> Result<(T, usize)>
where
T: Integer,
Unsigned: UnsignedInteger,
{
algorithm!(bytes, FORMAT, T, Unsigned, parse_1digit, invalid_digit_partial, into_ok_partial)
algorithm!(bytes, FORMAT, T, parse_1digit, invalid_digit_partial, into_ok_partial)
}
Loading

0 comments on commit fb3e059

Please sign in to comment.