Fix integer parsing correctness issues.

This avoids using an intermediary unsigned type and then post-overflow validation, which avoids issues with overflowing checks wrapping to above our min value. This does not meaningfully affect performance, despite simplifying the type checks dramatically. The float and integer parsing benchmarks are comparable to before, with the integer parsing being slower than core for simple integers but much faster for larger/random integers due to multi-digit parsing. - Closes #91
Alexhuszagh · Sep 9, 2024 · fb3e059 · fb3e059
1 parent 05cc142
commit fb3e059
Show file tree

Hide file tree

Showing 12 changed files with 165 additions and 346 deletions.
diff --git a/lexical-parse-integer/docs/Algorithm.md b/lexical-parse-integer/docs/Algorithm.md
@@ -17,185 +17,3 @@ Therefore, only 1 optimization for parsing multiple digits was used for each typ
 Finally, for 32-bit and 64-bit signed integers, we use no multiple-digit optimizations, since they provide **no** benefit for 32-bit integers in any cases, and only ~23% benefit for large 64-bit integers. However, for simple integers, due to the increased branching, they induce a performance penalty of ~50%.
 
 In addition, rather than have separate branches for positive and negative numbers, both are parsed as unsigned integers, and then converted to the signed variant, after overflowing checking.
-
-**Overflow Checking**
-
-Rather than do checked multiplication and additions in each loop, which increases the amount of branching, we can check after if numeric overflow has occurred by checking the number of parsed digits, and the resulting value.
-
-Given the following unoptimized code:
-
-```rust,ignore
-pub fn parse(bytes: &[u8]) -> Result<u64, ()> {
-    let mut value: u64 = 0;
-    let mut iter = bytes.iter();
-    while let Some(&c) = iter.next() {
-        let digit = match (c as char).to_digit(10) {
-            Some(v) => v,
-            None => return Err(()),
-        };
-        value = match value.checked_mul(10) {
-            Some(v) => v,
-            None => return Err(()),
-        };
-        value = match value.checked_add(digit as u64) {
-            Some(v) => v,
-            None => return Err(()),
-        };
-    }
-    Ok(value)
-}
-```
-
-This translates to the following assembly:
-
-```asm
-example::parse:
-        xor     r11d, r11d
-        mov     r10d, 10
-        xor     eax, eax
-        mov     r8d, 1
-.LBB0_1:
-        mov     r9, rax
-        cmp     rsi, r11
-        je      .LBB0_2
-        movzx   ecx, byte ptr [rdi + r11]
-        add     ecx, -48
-        cmp     ecx, 10
-        jae     .LBB0_6
-        mov     rax, r9
-        mul     r10
-        jo      .LBB0_6
-        mov     ecx, ecx
-        add     r11, 1
-        add     rax, rcx
-        jae     .LBB0_1
-.LBB0_6:
-        mov     rax, r8
-        mov     rdx, r9
-        ret
-.LBB0_2:
-        xor     r8d, r8d
-        mov     rax, r8
-        mov     rdx, r9
-        ret
-```
-
-We optimize it to the following code:
-
-```rust,ignore
-pub fn parse(bytes: &[u8]) -> Result<u64, ()> {
-    let mut value: u64 = 0;
-    let mut iter = bytes.iter();
-    while let Some(&c) = iter.next() {
-        let digit = match (c as char).to_digit(10) {
-            Some(v) => v,
-            None => return Err(()),
-        };
-        value = value.wrapping_mul(10);
-        value = value.wrapping_mul(digit as u64);
-    }
-    Ok(value)
-}
-```
-
-Which produces the following assembly:
-
-```asm
-example::parse:
-        xor     eax, eax
-        xor     ecx, ecx
-.LBB0_1:
-        cmp     rsi, rcx
-        je      .LBB0_4
-        movzx   edx, byte ptr [rdi + rcx]
-        add     edx, -48
-        add     rcx, 1
-        cmp     edx, 10
-        jb      .LBB0_1
-        mov     eax, 1
-.LBB0_4:
-        xor     edx, edx
-        ret
-```
-
-This is much more efficient, however, there is one major limitation: we cannot know if numerical overflow has occurred, and must do it after the fact. We have numerical overflow on two cases: we parsed more digits than we theoretically could, or we parsed the same number as the maximum, but the number wrapped. Since the number wrapping will always produce a smaller value than the minimum value for that number of digits, this is a simple comparison.
-
-For unsigned integers, this is quite easy: we merely need to know the maximum number of digits that can be parsed without guaranteeing numerical overflow, and the number of digits that were parsed.
-
-```rust,ignore
-// For example, max could be 20 for u64.
-let count = ...;    // Actual number of digits parsed.
-let max = ...;      // Maximum number of digits that could be parsed.
-// Calculate the minimum value from processing `max` digits.
-let min_value = 10u64.pow(max as u32 - 1);
-// If we've processed more than the max digits, or if the value wrapped,
-// we have overflow.
-let is_overflow = count > max || (count == max && value < min_value);
-```
-
-For signed integers, it's slightly more complicated, but still quite easy:
-
-```rust,ignore
-// For example, max could be 18 for i64.
-let count = ...;        // Actual number of digits parsed.
-let max = ...;          // Maximum number of digits that could be parsed.
-let is_negative = ...;  // If the value is less than 0.
-// Calculate the minimum value from processing `max` digits.
-let min_value = 10u64.pow(max as u32 - 1);
-let max_value = i64::MAX as u64 + 1;
-let is_overflow = count > max
-    || (count == max && (
-        value < min_value
-        || value > max_value
-        || (!is_negative && value == max_value)
-    ));
-```
-
-All of the powers and constant generation is resolved at compile-time, producing efficient routines. For example, for `u64`, the following rust code:
-
-```rust,ignore
-pub fn is_overflow(value: u64, count: usize) -> bool {
-    let max: usize = 20;
-    let min_value = 10u64.pow(max as u32 - 1);
-    count > max || (count == max && value < min_value)
-}
-```
-
-... produces the following assembly:
-
-```asm
-example::is_overflow:
-        cmp     rsi, 20
-        seta    cl
-        sete    dl
-        movabs  rax, -8446744073709551616
-        cmp     rdi, rax
-        setb    al
-        and     al, dl
-        or      al, cl
-        ret
-```
-
-Not bad at all.
-
-**Compact**
-
-For our compact implementation, prioritizing code size at the cost of performance, we use a naive algorithm that parses 1 digit at a time, without any additional optimizations. This algorithm is trivial to verify, and is effectively analogous to the following code:
-
-```rust,ignore
-let mut value = 0;
-while let Some(&c) = iter.next() {
-    let digit = match (c as char).to_digit(radix) {
-        Some(v) => v,
-        None => return Err(...),
-    };
-    value = match value.checked_mul(radix) {
-        Some(v) => v,
-        None => return Err(...),
-    };
-    value = match value.checked_add(digit) {
-        Some(v) => v,
-        None => return Err(...),
-    };
-}
-```
diff --git a/lexical-parse-integer/src/algorithm.rs b/lexical-parse-integer/src/algorithm.rs
@@ -11,11 +11,10 @@
 #![cfg(not(feature = "compact"))]
 #![doc(hidden)]
 
-use crate::shared::is_overflow;
 use lexical_util::digit::char_to_digit_const;
 use lexical_util::format::NumberFormat;
 use lexical_util::iterator::{AsBytes, BytesIter};
-use lexical_util::num::{as_cast, Integer, UnsignedInteger};
+use lexical_util::num::{as_cast, Integer};
 use lexical_util::result::Result;
 use lexical_util::step::min_step;
 
@@ -39,7 +38,9 @@ macro_rules! parse_8digits {
         $value:ident,
         $iter:ident,
         $format:ident,
-        $t:ident
+        $t:ident,
+        $overflow:ident,
+        $op:ident
     ) => {{
         let radix: $t = as_cast(NumberFormat::<{ $format }>::MANTISSA_RADIX);
         let radix2: $t = radix.wrapping_mul(radix);
@@ -48,8 +49,13 @@ macro_rules! parse_8digits {
 
         // Try our fast, 8-digit at a time optimizations.
         while let Some(val8) = try_parse_8digits::<$t, _, $format>(&mut $iter) {
-            $value = $value.wrapping_mul(radix8);
-            $value = $value.wrapping_add(val8);
+            let optvalue = $value.checked_mul(radix8).and_then(|x| x.$op(val8));
+            if let Some(unwrapped) = optvalue {
+                $value = unwrapped;
+            } else {
+                $overflow = true;
+                break;
+            }
         }
     }};
 }
@@ -65,16 +71,23 @@ macro_rules! parse_4digits {
         $value:ident,
         $iter:ident,
         $format:ident,
-        $t:ident
+        $t:ident,
+        $overflow:ident,
+        $op:ident
     ) => {{
         let radix: $t = as_cast(NumberFormat::<{ $format }>::MANTISSA_RADIX);
         let radix2: $t = radix.wrapping_mul(radix);
         let radix4: $t = radix2.wrapping_mul(radix2);
 
         // Try our fast, 4-digit at a time optimizations.
         while let Some(val4) = try_parse_4digits::<$t, _, $format>(&mut $iter) {
-            $value = $value.wrapping_mul(radix4);
-            $value = $value.wrapping_add(val4);
+            let optvalue = $value.checked_mul(radix4).and_then(|x| x.$op(val4));
+            if let Some(unwrapped) = optvalue {
+                $value = unwrapped;
+            } else {
+                $overflow = true;
+                break;
+            }
         }
     }};
 }
@@ -90,8 +103,9 @@ macro_rules! parse_digits {
         $is_negative:ident,
         $start_index:ident,
         $t:ident,
-        $u:ident,
-        $invalid_digit:ident
+        $invalid_digit:ident,
+        $overflow:ident,
+        $op:ident
     ) => {{
         //  WARNING:
         //      Performance is heavily dependent on the amount of branching.
@@ -120,42 +134,37 @@ macro_rules! parse_digits {
         // Optimizations for reading 8-digits at a time.
         // Makes no sense to do 8 digits at a time for 32-bit values,
         // since it can only hold 8 digits for base 10.
-        if <$t>::BITS == 128 && can_try_parse_multidigits!($iter, radix) {
-            parse_8digits!($value, $iter, $format, $u);
-        }
-        if <$t>::BITS == 64 && can_try_parse_multidigits!($iter, radix) && !<$t>::IS_SIGNED {
-            parse_8digits!($value, $iter, $format, $u);
+        // NOTE: These values were determined as optimization
+        if (<$t>::BITS == 128 || <$t>::BITS == 64) && can_try_parse_multidigits!($iter, radix) && $iter.length() >= 8 {
+            parse_8digits!($value, $iter, $format, $t, $overflow, $op);
         }
 
         // Optimizations for reading 4-digits at a time.
         // 36^4 is larger than a 16-bit integer. Likewise, 10^4 is almost
         // the limit of u16, so it's not worth it.
-        if <$t>::BITS == 32 && can_try_parse_multidigits!($iter, radix) && !<$t>::IS_SIGNED {
-            parse_4digits!($value, $iter, $format, $u);
+        if <$t>::BITS == 32 && can_try_parse_multidigits!($iter, radix)  && $iter.length() >= 4 && !$overflow {
+            parse_4digits!($value, $iter, $format, $t, $overflow, $op);
         }
-
-        parse_1digit!($value, $iter, $format, $is_negative, $start_index, $t, $u, $invalid_digit)
+        parse_1digit!($value, $iter, $format, $is_negative, $start_index, $t, $invalid_digit, $overflow, $op);
     }};
 }
 
 /// Algorithm for the complete parser.
 #[inline(always)]
-pub fn algorithm_complete<T, Unsigned, const FORMAT: u128>(bytes: &[u8]) -> Result<T>
+pub fn algorithm_complete<T, const FORMAT: u128>(bytes: &[u8]) -> Result<T>
 where
     T: Integer,
-    Unsigned: UnsignedInteger,
 {
-    algorithm!(bytes, FORMAT, T, Unsigned, parse_digits, invalid_digit_complete, into_ok_complete)
+    algorithm!(bytes, FORMAT, T, parse_digits, invalid_digit_complete, into_ok_complete)
 }
 
 /// Algorithm for the partial parser.
 #[inline(always)]
-pub fn algorithm_partial<T, Unsigned, const FORMAT: u128>(bytes: &[u8]) -> Result<(T, usize)>
+pub fn algorithm_partial<T, const FORMAT: u128>(bytes: &[u8]) -> Result<(T, usize)>
 where
     T: Integer,
-    Unsigned: UnsignedInteger,
 {
-    algorithm!(bytes, FORMAT, T, Unsigned, parse_digits, invalid_digit_partial, into_ok_partial)
+    algorithm!(bytes, FORMAT, T, parse_digits, invalid_digit_partial, into_ok_partial)
 }
 
 // DIGIT OPTIMIZATIONS

diff --git a/lexical-parse-integer/src/api.rs b/lexical-parse-integer/src/api.rs
@@ -20,7 +20,7 @@ macro_rules! integer_from_lexical {
             #[cfg_attr(not(feature = "compact"), inline)]
             fn from_lexical(bytes: &[u8]) -> lexical_util::result::Result<Self>
             {
-                Self::parse_complete::<$unsigned, STANDARD>(bytes)
+                Self::parse_complete::<STANDARD>(bytes)
             }
 
             $(#[$meta:meta])?
@@ -29,7 +29,7 @@ macro_rules! integer_from_lexical {
                 bytes: &[u8],
             ) -> lexical_util::result::Result<(Self, usize)>
             {
-                Self::parse_partial::<$unsigned, STANDARD>(bytes)
+                Self::parse_partial::<STANDARD>(bytes)
             }
         }
 
@@ -47,7 +47,7 @@ macro_rules! integer_from_lexical {
                 if !format.is_valid() {
                     return Err(format.error());
                 }
-                Self::parse_complete::<$unsigned, FORMAT>(bytes)
+                Self::parse_complete::<FORMAT>(bytes)
             }
 
             $(#[$meta:meta])?
@@ -61,7 +61,7 @@ macro_rules! integer_from_lexical {
                 if !format.is_valid() {
                     return Err(format.error());
                 }
-                Self::parse_partial::<$unsigned, FORMAT>(bytes)
+                Self::parse_partial::<FORMAT>(bytes)
             }
         }
     )*)

diff --git a/lexical-parse-integer/src/compact.rs b/lexical-parse-integer/src/compact.rs
@@ -5,28 +5,25 @@
 #![cfg(feature = "compact")]
 #![doc(hidden)]
 
-use crate::shared::is_overflow;
 use lexical_util::digit::char_to_digit_const;
 use lexical_util::format::NumberFormat;
 use lexical_util::iterator::{AsBytes, BytesIter};
-use lexical_util::num::{as_cast, Integer, UnsignedInteger};
+use lexical_util::num::{as_cast, Integer};
 use lexical_util::result::Result;
 use lexical_util::step::min_step;
 
 /// Algorithm for the complete parser.
-pub fn algorithm_complete<T, Unsigned, const FORMAT: u128>(bytes: &[u8]) -> Result<T>
+pub fn algorithm_complete<T, const FORMAT: u128>(bytes: &[u8]) -> Result<T>
 where
     T: Integer,
-    Unsigned: UnsignedInteger,
 {
-    algorithm!(bytes, FORMAT, T, Unsigned, parse_1digit, invalid_digit_complete, into_ok_complete)
+    algorithm!(bytes, FORMAT, T, parse_1digit, invalid_digit_complete, into_ok_complete)
 }
 
 /// Algorithm for the partial parser.
-pub fn algorithm_partial<T, Unsigned, const FORMAT: u128>(bytes: &[u8]) -> Result<(T, usize)>
+pub fn algorithm_partial<T, const FORMAT: u128>(bytes: &[u8]) -> Result<(T, usize)>
 where
     T: Integer,
-    Unsigned: UnsignedInteger,
 {
-    algorithm!(bytes, FORMAT, T, Unsigned, parse_1digit, invalid_digit_partial, into_ok_partial)
+    algorithm!(bytes, FORMAT, T, parse_1digit, invalid_digit_partial, into_ok_partial)
 }