Remove unnecessary uses of `unsafe` #11

nakedible · 2023-10-29T20:37:41Z

The compiler is smart in removing comparisons that it already knows are not true. This allows us to reduce the use of unsafe in safe functions without affecting the produced code.

Each one of these code changes has been tested to not produce any difference in the generated assembly in a release build (with the input given as std::hint::black_box so the compiler won't optimize on the static value given in code). Of course no guarantees can be given that every Rust compiler on every platform always will make the same optimizations.

I consider new_saturating to be a fundamental and useful method to expose publicly, but I left it here as pub(crate) as to not change the public API of the library.

jhpratt

The first commit is a no-brainer, and something I didn't think about when I added new_static. The second commit I'm more skeptical of. You say that it's been tested; can you share those? I suspect that there are edge cases that haven't been considered.

jhpratt · 2023-10-29T23:45:38Z

src/lib.rs

+                #[inline(always)]
+                pub(crate) const fn new_saturating(value: $internal) -> Self {
+                    <Self as $crate::traits::RangeIsValid>::ASSERT;
+                    Self(if value < MIN {
+                        MIN
+                    } else if value > MAX {
+                        MAX
+                    } else {
+                        value
+                    })
+                }


This removes the call to assume, which could reasonably be expected to result in worse optimization for downstream crates. That's why new_unchecked was used in the first place.

I'm not sure how to prove to you that if you do this if comparison, it is "as good" as your assume thing.

Here's a trivial godbolt showing that the compiler can elide bounds checking (as evidenced by not having panic as an option for the second functions) if there is an if where it can deduce that the value must be within bounds: https://godbolt.org/z/nK1s7zTGd

Again, I make no guarantees that every version of every Rust compiler on every platform in every situation makes this optimization – but it's pretty basic and necessary optimization for the elimination of array bounds checking in many cases.

#[inline(never)] fn try_bounds(a: i32) -> i32 { let arr: [i32; 61] = core::array::from_fn(|i| i as i32); let ind = RangedI32::<3, 63>::new_saturating(a); arr[(ind.get()-3) as usize] }

results in:

deranged::try_bounds: sub rsp, 120 movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_0] movaps xmmword, ptr, [rsp, -, 128], xmm0 movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_1] movaps xmmword, ptr, [rsp, -, 112], xmm0 movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_2] movaps xmmword, ptr, [rsp, -, 96], xmm0 movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_3] movaps xmmword, ptr, [rsp, -, 80], xmm0 movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_4] movaps xmmword, ptr, [rsp, -, 64], xmm0 movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_5] movaps xmmword, ptr, [rsp, -, 48], xmm0 movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_6] movaps xmmword, ptr, [rsp, -, 32], xmm0 movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_7] movaps xmmword, ptr, [rsp, -, 16], xmm0 movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_8] movaps xmmword, ptr, [rsp], xmm0 movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_9] movaps xmmword, ptr, [rsp, +, 16], xmm0 movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_10] movaps xmmword, ptr, [rsp, +, 32], xmm0 movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_11] movaps xmmword, ptr, [rsp, +, 48], xmm0 movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_12] movaps xmmword, ptr, [rsp, +, 64], xmm0 movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_13] movaps xmmword, ptr, [rsp, +, 80], xmm0 movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_14] movaps xmmword, ptr, [rsp, +, 96], xmm0 mov dword, ptr, [rsp, +, 112], 60 cmp edi, 3 jge .LBB20_2 xor eax, eax .LBB20_4: mov eax, dword, ptr, [rsp, +, 4*rax, -, 128] add rsp, 120 ret .LBB20_2: mov eax, 60 cmp edi, 63 ja .LBB20_4 add edi, -3 mov rax, rdi mov eax, dword, ptr, [rsp, +, 4*rax, -, 128] add rsp, 120 ret

As you can see, no jumps to panic, so the compiler has decided that the array access is in bounds. If the array is changed to be 60 elements long, the jump to panic handler is generated by the compiler.

new_saturating does not use assume directly, but it does within the internal constructor. Testing that is likely not straightforward. I will note that the example you gave tests the behavior of get, not new_saturating. That is where the bounds elision is coming from. As to the godbolt, I am well aware that the compiler is able to remove checks if it's statically known. That is basic optimization.

The reason new_saturating uses the unsafe constructor is because it is, in fact, unsafe. A value is being assigned to the inner field, which must be unsafe to initialize or mutate. Until my unsafe fields RFC lands, I created the submodule specifically to localize any code that doesn't require unsafe for technical reasons.

new_saturating does not use assume directly, but it does within the internal constructor. Testing that is likely not straightforward. I will note that the example you gave tests the behavior of get, not new_saturating. That is where the bounds elision is coming from.

You are right, the get() call is included, I missed that when writing that small example. However, the assembly is 100% the same if I comment out the unsafe assume part in get(). The bounds elision is also coming directly from new_saturating.

As to the godbolt, I am well aware that the compiler is able to remove checks if it's statically known. That is basic optimization.

The godbolt was meant to mimic the code written in this library for the "new saturating" constructor and then accessing the value. You should be able to see that the usage is analogous.

The reason new_saturating uses the unsafe constructor is because it is, in fact, unsafe. A value is being assigned to the inner field, which must be unsafe to initialize or mutate.

Yet your new constructor does exactly the same value initialization:

/// Creates a ranged integer if the given value is in the range /// `MIN..=MAX`. #[inline(always)] pub const fn new(value: $internal) -> Option<Self> { <Self as $crate::traits::RangeIsValid>::ASSERT; if value < MIN || value > MAX { None } else { Some(Self(value)) } }

There's no mention of unsafe there. If you'd want to have the assume be a part of that too, you'd write:

/// Creates a ranged integer if the given value is in the range /// `MIN..=MAX`. #[inline(always)] pub const fn new(value: $internal) -> Option<Self> { <Self as $crate::traits::RangeIsValid>::ASSERT; if value < MIN || value > MAX { None } else { unsafe { $crate::assume(MIN <= value && value <= MAX) }; Some(Self(value)) } }

But, like already said, I'm convinced that the compiler will deduce the exact same bounds from the if clause even without the assume line. I'm not sure if there's some IR that LLVM can output to show that.

Until my unsafe fields RFC lands, I created the submodule specifically to localize any code that doesn't require unsafe for technical reasons.

Great work, hope it will land soon. And I'm biting my lip that bounded integers in some format land some day as well.

Yet your new constructor does exactly the same value initialization:

That is an oversight due to how I originally wrote the code. It absolutely should call new_unchecked.

I have also confirmed that new_saturating is not a drop-in replacement for new_unchecked:

use deranged::RangedUsize; /// Safety: `x` must be in the range `0..=3`. pub unsafe fn foo(x: usize) -> (usize, RangedUsize<0, 3>) { // let r = unsafe { RangedUsize::new_unchecked(x) }; let r = RangedUsize::new_saturating(x); let arr = [205, 37, 173, 9]; let el = arr[x]; (el, r) }

Changing how r is defined, as I expected, allows the compiler to remove the potential panic (with the result being UB due to violating safety invariants). As with most things research, it's necessary to try to prove yourself wrong rather than trying to confirm your beliefs. That generally goes quite a bit farther in my experience.

That is an oversight due to how I originally wrote the code. It absolutely should call new_unchecked.

Awesome that I was able to spot a bug, then :-) I assumed that that code was exactly how you wanted it, so the new_saturating code wasn't significantly different.

I have also confirmed that new_saturating is not a drop-in replacement for new_unchecked:

Yeah, I never claimed that it would have the same effect – I only claimed that in the usages I put it in, there is no difference. Explaining in accurate details so we may find where the difference in opinion is...

use deranged::RangedUsize; /// Safety: `x` must be in the range `0..=3`. pub unsafe fn foo(x: usize) -> (usize, RangedUsize<0, 3>) { // let r = unsafe { RangedUsize::new_unchecked(x) }; let r = RangedUsize::new_saturating(x); let arr = [205, 37, 173, 9]; let el = arr[x]; (el, r) }

This function set r to be the new_saturating version of x. This means that if x is for example 16, then r is 3. The code then indexes the array with x, which means that it will panic with that given value of x.

If we change it to your unsafe version, then the compiler will assume, based on the assume call in new_unchecked that x is in fact in the range 0..=3. This means that given a value of 16, this call will be unsound, as it indexes past the end of the array.

This is working exactly as it should be – you need the unsafe assume in order to make the compiler create unsound code.

However, what I'm saying is that the compiler can deduce that the value r is in fact 0..=3, so if we use the value r instead of x, the generated assembly code will not have a panic handler.

Now, in order demonstrate this, I made the internal type of RangedUsize be public – so we can bypass the get() method entirely.

Example one:

#[inline(never)] /// Safety: `x` must be in the range `0..=3`. fn foo(x: usize) -> (usize, RangedUsize<0, 3>) { // let r = unsafe { RangedUsize::new_unchecked(x) }; let r = RangedUsize::new_saturating(x); let arr = [205, 37, 173, 9]; let el = arr[x]; (el, r) }

deranged::foo: sub rsp, 40 mov qword, ptr, [rsp, +, 8], 205 mov qword, ptr, [rsp, +, 16], 37 mov qword, ptr, [rsp, +, 24], 173 mov qword, ptr, [rsp, +, 32], 9 cmp rdi, 3 ja .LBB21_2 mov rax, qword, ptr, [rsp, +, 8*rdi, +, 8] mov rdx, rdi add rsp, 40 ret .LBB21_2: lea rdx, [rip, +, .L__unnamed_10] mov esi, 4 call qword, ptr, [rip, +, _ZN4core9panicking18panic_bounds_check17hb0d791a602836d16E@GOTPCREL] ud2

We can see that the panic handler is included.

Example two:

#[inline(never)] /// Safety: `x` must be in the range `0..=3`. fn foo(x: usize) -> (usize, RangedUsize<0, 3>) { let r = unsafe { RangedUsize::new_unchecked(x) }; // let r = RangedUsize::new_saturating(x); let arr = [205, 37, 173, 9]; let el = arr[x]; (el, r) }

deranged::foo: mov qword, ptr, [rsp, -, 32], 205 mov qword, ptr, [rsp, -, 24], 37 mov qword, ptr, [rsp, -, 16], 173 mov qword, ptr, [rsp, -, 8], 9 mov rax, qword, ptr, [rsp, +, 8*rdi, -, 32] mov rdx, rdi ret

We can see that the panic check is no longer included, but the function is unsound if given values outside 0..=3. So now the new example.

Example three:

#[inline(never)] fn foo(x: usize) -> (usize, RangedUsize<0, 3>) { // let r = unsafe { RangedUsize::new_unchecked(x) }; let r = RangedUsize::new_saturating(x); let arr = [205, 37, 173, 9]; let el = arr[r.0]; (el, r) }

deranged::foo: cmp rdi, 3 mov edx, 3 cmovb rdx, rdi mov qword, ptr, [rsp, -, 32], 205 mov qword, ptr, [rsp, -, 24], 37 mov qword, ptr, [rsp, -, 16], 173 mov qword, ptr, [rsp, -, 8], 9 mov rax, qword, ptr, [rsp, +, 8*rdx, -, 32] ret

So now we used r.0 instead of x to index the array. The resulting code includes the comparison caused by new_saturating, just like example 1, but it doesn't contain a panic handler, because the compiler knows the value is in the range 0..=3. And this code will not be unsound.

To further illustrate how the reasoning of the compiler is exactly the same, let me undress the macros in new_unchecked and write it out:

pub const unsafe fn new_unchecked(value: $internal) -> Self { assert!(MIN <= MAX); if !(MIN <= value && value <= MAX) { core::hint::unreachable_unchecked(); } Self(value) }

From this code, the compiler can deduce that value is between MIN and MAX.

Now given the following code:

pub const fn new_safe(value: $internal) -> Self { assert!(MIN <= MAX); if !(MIN <= value && value <= MAX) { return Self(MIN); } Self(value) }

The compiler just as well deduces that the returned value is between MIN and MAX as with the assume hint above.

I am quite convinced that adding the assume line (unsafe { $crate::assume(MIN <= value && value <= MAX) };) to either new_saturating or new will not enable any new compiler optimizations (at least not in release mode).

And to your final comment – I have repeatedly been trying to find faults in my experiments and this isn't the first time I've dabbled with how LLVM handles value range analysis. If you can find any corner case where the code changes I proposed would not generate exactly the same assembly as your original code, I would be extremely interested in seeing it. But I haven't been able to find one, even though I've tried.

But finally once again the disclaimer – the requirement for the compiler to optimize away the pointless checks from new_saturating depends on the compiler to be able to do value range analysis on this case. Which LLVM can do. But as with all optimizations, there is no hard guarantee that every rust compiler (some of which might not use LLVM) will do it, or do it the same way. So there is no hard guarantee that the checks will not be there – but if value range analysis doesn't work, then your assume will also not allow optimizations as it depends on the exact same mechanism.

src/lib.rs

nakedible · 2023-10-30T18:30:34Z

Basically the method I used was that I made a bunch of simple wrapper functions like this:

#[inline(never)]
fn try_expand(v: RangedI32<3, 63>) -> RangedI32<-2, 64> {
    v.expand()
}

And then used cargo asm to dump the function before and after removing the unsafe bit, and ensuring that there was no change in the assembly output:

deranged::try_expand:
 mov     eax, edi
 ret

I might be able to wrap these in some testing setup where it would be possible to automatically dump the assembly of all relevant functions to a file. It's often beneficial to be able to see how the assembly changes when changing the implementation – for example if suddenly there's the addition of a panic handler or something that is unexpected, or a zero runtime cost abstraction ends up having runtime cost.

jhpratt · 2023-10-31T05:32:32Z

Doing that would help to an extent, but ultimately I've got to be as certain as possible it's a zero-cost abstraction. deranged is depended upon by time, which gets hundreds of thousands of downloads every day. I am putting quite a bit of effort into optimization nowadays for a few different reasons. Even a small change will have a notable impact.

nakedible · 2023-10-31T10:10:37Z

If you ultimately don't want to trust the compiler that it is able to optimize away cases like this, then that's a clear position to take – me attempting to show that the compiler can in fact deduce the checks away is not going to change that.

I might take one more stab at this by formulating the change in a different way, but this second commit will need to be fully rewritten.

jhpratt · 2024-02-10T22:26:00Z

Closing per my comment above. I created a trivial case with a demonstrable difference in generated assembly, which is not an acceptable tradeoff.

nakedible added 2 commits October 29, 2023 13:39

Use new_static to create MIN and MAX

3f6f4a6

Remove many unnecessary uses of unsafe

ebc53f8

nakedible mentioned this pull request Oct 29, 2023

Consider creating a safe version of this library #10

Closed

jhpratt reviewed Oct 29, 2023

View reviewed changes

Fixup: Revert accidental inline hint change

a430d15

jhpratt closed this Feb 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unnecessary uses of `unsafe` #11

Remove unnecessary uses of `unsafe` #11

nakedible commented Oct 29, 2023

jhpratt left a comment

jhpratt Oct 29, 2023

nakedible Oct 30, 2023

nakedible Oct 30, 2023

jhpratt Oct 31, 2023 •

edited

Loading

nakedible Oct 31, 2023

jhpratt Nov 1, 2023

nakedible Nov 1, 2023

nakedible commented Oct 30, 2023

jhpratt commented Oct 31, 2023

nakedible commented Oct 31, 2023

jhpratt commented Feb 10, 2024

Remove unnecessary uses of unsafe #11

Remove unnecessary uses of unsafe #11

Conversation

nakedible commented Oct 29, 2023

jhpratt left a comment

Choose a reason for hiding this comment

jhpratt Oct 29, 2023

Choose a reason for hiding this comment

nakedible Oct 30, 2023

Choose a reason for hiding this comment

nakedible Oct 30, 2023

Choose a reason for hiding this comment

jhpratt Oct 31, 2023 • edited Loading

Choose a reason for hiding this comment

nakedible Oct 31, 2023

Choose a reason for hiding this comment

jhpratt Nov 1, 2023

Choose a reason for hiding this comment

nakedible Nov 1, 2023

Choose a reason for hiding this comment

nakedible commented Oct 30, 2023

jhpratt commented Oct 31, 2023

nakedible commented Oct 31, 2023

jhpratt commented Feb 10, 2024

Remove unnecessary uses of `unsafe` #11

Remove unnecessary uses of `unsafe` #11

jhpratt Oct 31, 2023 •

edited

Loading