Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary uses of unsafe #11

Closed
wants to merge 3 commits into from

Conversation

nakedible
Copy link

The compiler is smart in removing comparisons that it already knows are not true. This allows us to reduce the use of unsafe in safe functions without affecting the produced code.

Each one of these code changes has been tested to not produce any difference in the generated assembly in a release build (with the input given as std::hint::black_box so the compiler won't optimize on the static value given in code). Of course no guarantees can be given that every Rust compiler on every platform always will make the same optimizations.

I consider new_saturating to be a fundamental and useful method to expose publicly, but I left it here as pub(crate) as to not change the public API of the library.

Copy link
Owner

@jhpratt jhpratt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first commit is a no-brainer, and something I didn't think about when I added new_static. The second commit I'm more skeptical of. You say that it's been tested; can you share those? I suspect that there are edge cases that haven't been considered.

src/lib.rs Outdated
Comment on lines 236 to 246
#[inline(always)]
pub(crate) const fn new_saturating(value: $internal) -> Self {
<Self as $crate::traits::RangeIsValid>::ASSERT;
Self(if value < MIN {
MIN
} else if value > MAX {
MAX
} else {
value
})
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removes the call to assume, which could reasonably be expected to result in worse optimization for downstream crates. That's why new_unchecked was used in the first place.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to prove to you that if you do this if comparison, it is "as good" as your assume thing.

Here's a trivial godbolt showing that the compiler can elide bounds checking (as evidenced by not having panic as an option for the second functions) if there is an if where it can deduce that the value must be within bounds: https://godbolt.org/z/nK1s7zTGd

Again, I make no guarantees that every version of every Rust compiler on every platform in every situation makes this optimization – but it's pretty basic and necessary optimization for the elimination of array bounds checking in many cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[inline(never)]
fn try_bounds(a: i32) -> i32 {
    let arr: [i32; 61] = core::array::from_fn(|i| i as i32);
    let ind = RangedI32::<3, 63>::new_saturating(a);
    arr[(ind.get()-3) as usize]
}

results in:

deranged::try_bounds:
 sub     rsp, 120
 movaps  xmm0, xmmword, ptr, [rip, +, .LCPI20_0]
 movaps  xmmword, ptr, [rsp, -, 128], xmm0
 movaps  xmm0, xmmword, ptr, [rip, +, .LCPI20_1]
 movaps  xmmword, ptr, [rsp, -, 112], xmm0
 movaps  xmm0, xmmword, ptr, [rip, +, .LCPI20_2]
 movaps  xmmword, ptr, [rsp, -, 96], xmm0
 movaps  xmm0, xmmword, ptr, [rip, +, .LCPI20_3]
 movaps  xmmword, ptr, [rsp, -, 80], xmm0
 movaps  xmm0, xmmword, ptr, [rip, +, .LCPI20_4]
 movaps  xmmword, ptr, [rsp, -, 64], xmm0
 movaps  xmm0, xmmword, ptr, [rip, +, .LCPI20_5]
 movaps  xmmword, ptr, [rsp, -, 48], xmm0
 movaps  xmm0, xmmword, ptr, [rip, +, .LCPI20_6]
 movaps  xmmword, ptr, [rsp, -, 32], xmm0
 movaps  xmm0, xmmword, ptr, [rip, +, .LCPI20_7]
 movaps  xmmword, ptr, [rsp, -, 16], xmm0
 movaps  xmm0, xmmword, ptr, [rip, +, .LCPI20_8]
 movaps  xmmword, ptr, [rsp], xmm0
 movaps  xmm0, xmmword, ptr, [rip, +, .LCPI20_9]
 movaps  xmmword, ptr, [rsp, +, 16], xmm0
 movaps  xmm0, xmmword, ptr, [rip, +, .LCPI20_10]
 movaps  xmmword, ptr, [rsp, +, 32], xmm0
 movaps  xmm0, xmmword, ptr, [rip, +, .LCPI20_11]
 movaps  xmmword, ptr, [rsp, +, 48], xmm0
 movaps  xmm0, xmmword, ptr, [rip, +, .LCPI20_12]
 movaps  xmmword, ptr, [rsp, +, 64], xmm0
 movaps  xmm0, xmmword, ptr, [rip, +, .LCPI20_13]
 movaps  xmmword, ptr, [rsp, +, 80], xmm0
 movaps  xmm0, xmmword, ptr, [rip, +, .LCPI20_14]
 movaps  xmmword, ptr, [rsp, +, 96], xmm0
 mov     dword, ptr, [rsp, +, 112], 60
 cmp     edi, 3
 jge     .LBB20_2
 xor     eax, eax
.LBB20_4:
 mov     eax, dword, ptr, [rsp, +, 4*rax, -, 128]
 add     rsp, 120
 ret
.LBB20_2:
 mov     eax, 60
 cmp     edi, 63
 ja      .LBB20_4
 add     edi, -3
 mov     rax, rdi
 mov     eax, dword, ptr, [rsp, +, 4*rax, -, 128]
 add     rsp, 120
 ret

As you can see, no jumps to panic, so the compiler has decided that the array access is in bounds. If the array is changed to be 60 elements long, the jump to panic handler is generated by the compiler.

Copy link
Owner

@jhpratt jhpratt Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_saturating does not use assume directly, but it does within the internal constructor. Testing that is likely not straightforward. I will note that the example you gave tests the behavior of get, not new_saturating. That is where the bounds elision is coming from. As to the godbolt, I am well aware that the compiler is able to remove checks if it's statically known. That is basic optimization.

The reason new_saturating uses the unsafe constructor is because it is, in fact, unsafe. A value is being assigned to the inner field, which must be unsafe to initialize or mutate. Until my unsafe fields RFC lands, I created the submodule specifically to localize any code that doesn't require unsafe for technical reasons.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_saturating does not use assume directly, but it does within the internal constructor. Testing that is likely not straightforward. I will note that the example you gave tests the behavior of get, not new_saturating. That is where the bounds elision is coming from.

You are right, the get() call is included, I missed that when writing that small example. However, the assembly is 100% the same if I comment out the unsafe assume part in get(). The bounds elision is also coming directly from new_saturating.

As to the godbolt, I am well aware that the compiler is able to remove checks if it's statically known. That is basic optimization.

The godbolt was meant to mimic the code written in this library for the "new saturating" constructor and then accessing the value. You should be able to see that the usage is analogous.

The reason new_saturating uses the unsafe constructor is because it is, in fact, unsafe. A value is being assigned to the inner field, which must be unsafe to initialize or mutate.

Yet your new constructor does exactly the same value initialization:

                /// Creates a ranged integer if the given value is in the range
                /// `MIN..=MAX`.
                #[inline(always)]
                pub const fn new(value: $internal) -> Option<Self> {
                    <Self as $crate::traits::RangeIsValid>::ASSERT;
                    if value < MIN || value > MAX {
                        None
                    } else {
                        Some(Self(value))
                    }
                }

There's no mention of unsafe there. If you'd want to have the assume be a part of that too, you'd write:

                /// Creates a ranged integer if the given value is in the range
                /// `MIN..=MAX`.
                #[inline(always)]
                pub const fn new(value: $internal) -> Option<Self> {
                    <Self as $crate::traits::RangeIsValid>::ASSERT;
                    if value < MIN || value > MAX {
                        None
                    } else {
                        unsafe { $crate::assume(MIN <= value && value <= MAX) };
                        Some(Self(value))
                    }
                }

But, like already said, I'm convinced that the compiler will deduce the exact same bounds from the if clause even without the assume line. I'm not sure if there's some IR that LLVM can output to show that.

Until my unsafe fields RFC lands, I created the submodule specifically to localize any code that doesn't require unsafe for technical reasons.

Great work, hope it will land soon. And I'm biting my lip that bounded integers in some format land some day as well.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet your new constructor does exactly the same value initialization:

That is an oversight due to how I originally wrote the code. It absolutely should call new_unchecked.

I have also confirmed that new_saturating is not a drop-in replacement for new_unchecked:

use deranged::RangedUsize;

/// Safety: `x` must be in the range `0..=3`.
pub unsafe fn foo(x: usize) -> (usize, RangedUsize<0, 3>) {
    // let r = unsafe { RangedUsize::new_unchecked(x) };
    let r = RangedUsize::new_saturating(x);

    let arr = [205, 37, 173, 9];
    let el = arr[x];

    (el, r)
}

Changing how r is defined, as I expected, allows the compiler to remove the potential panic (with the result being UB due to violating safety invariants). As with most things research, it's necessary to try to prove yourself wrong rather than trying to confirm your beliefs. That generally goes quite a bit farther in my experience.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an oversight due to how I originally wrote the code. It absolutely should call new_unchecked.

Awesome that I was able to spot a bug, then :-) I assumed that that code was exactly how you wanted it, so the new_saturating code wasn't significantly different.

I have also confirmed that new_saturating is not a drop-in replacement for new_unchecked:

Yeah, I never claimed that it would have the same effect – I only claimed that in the usages I put it in, there is no difference. Explaining in accurate details so we may find where the difference in opinion is...

use deranged::RangedUsize;

/// Safety: `x` must be in the range `0..=3`.
pub unsafe fn foo(x: usize) -> (usize, RangedUsize<0, 3>) {
    // let r = unsafe { RangedUsize::new_unchecked(x) };
    let r = RangedUsize::new_saturating(x);

    let arr = [205, 37, 173, 9];
    let el = arr[x];

    (el, r)
}

This function set r to be the new_saturating version of x. This means that if x is for example 16, then r is 3. The code then indexes the array with x, which means that it will panic with that given value of x.

If we change it to your unsafe version, then the compiler will assume, based on the assume call in new_unchecked that x is in fact in the range 0..=3. This means that given a value of 16, this call will be unsound, as it indexes past the end of the array.

This is working exactly as it should be – you need the unsafe assume in order to make the compiler create unsound code.

However, what I'm saying is that the compiler can deduce that the value r is in fact 0..=3, so if we use the value r instead of x, the generated assembly code will not have a panic handler.

Now, in order demonstrate this, I made the internal type of RangedUsize be public – so we can bypass the get() method entirely.

Example one:

#[inline(never)]
/// Safety: `x` must be in the range `0..=3`.
fn foo(x: usize) -> (usize, RangedUsize<0, 3>) {
    // let r = unsafe { RangedUsize::new_unchecked(x) };
    let r = RangedUsize::new_saturating(x);

    let arr = [205, 37, 173, 9];
    let el = arr[x];

    (el, r)
}
deranged::foo:
 sub     rsp, 40
 mov     qword, ptr, [rsp, +, 8], 205
 mov     qword, ptr, [rsp, +, 16], 37
 mov     qword, ptr, [rsp, +, 24], 173
 mov     qword, ptr, [rsp, +, 32], 9
 cmp     rdi, 3
 ja      .LBB21_2
 mov     rax, qword, ptr, [rsp, +, 8*rdi, +, 8]
 mov     rdx, rdi
 add     rsp, 40
 ret
.LBB21_2:
 lea     rdx, [rip, +, .L__unnamed_10]
 mov     esi, 4
 call    qword, ptr, [rip, +, _ZN4core9panicking18panic_bounds_check17hb0d791a602836d16E@GOTPCREL]
 ud2

We can see that the panic handler is included.

Example two:

#[inline(never)]
/// Safety: `x` must be in the range `0..=3`.
fn foo(x: usize) -> (usize, RangedUsize<0, 3>) {
    let r = unsafe { RangedUsize::new_unchecked(x) };
    // let r = RangedUsize::new_saturating(x);

    let arr = [205, 37, 173, 9];
    let el = arr[x];

    (el, r)
}
deranged::foo:
 mov     qword, ptr, [rsp, -, 32], 205
 mov     qword, ptr, [rsp, -, 24], 37
 mov     qword, ptr, [rsp, -, 16], 173
 mov     qword, ptr, [rsp, -, 8], 9
 mov     rax, qword, ptr, [rsp, +, 8*rdi, -, 32]
 mov     rdx, rdi
 ret

We can see that the panic check is no longer included, but the function is unsound if given values outside 0..=3. So now the new example.

Example three:

#[inline(never)]
fn foo(x: usize) -> (usize, RangedUsize<0, 3>) {
    // let r = unsafe { RangedUsize::new_unchecked(x) };
    let r = RangedUsize::new_saturating(x);

    let arr = [205, 37, 173, 9];
    let el = arr[r.0];

    (el, r)
}
deranged::foo:
 cmp     rdi, 3
 mov     edx, 3
 cmovb   rdx, rdi
 mov     qword, ptr, [rsp, -, 32], 205
 mov     qword, ptr, [rsp, -, 24], 37
 mov     qword, ptr, [rsp, -, 16], 173
 mov     qword, ptr, [rsp, -, 8], 9
 mov     rax, qword, ptr, [rsp, +, 8*rdx, -, 32]
 ret

So now we used r.0 instead of x to index the array. The resulting code includes the comparison caused by new_saturating, just like example 1, but it doesn't contain a panic handler, because the compiler knows the value is in the range 0..=3. And this code will not be unsound.

To further illustrate how the reasoning of the compiler is exactly the same, let me undress the macros in new_unchecked and write it out:

pub const unsafe fn new_unchecked(value: $internal) -> Self {
    assert!(MIN <= MAX);
    if !(MIN <= value && value <= MAX) {
        core::hint::unreachable_unchecked();
    }
    Self(value)
}

From this code, the compiler can deduce that value is between MIN and MAX.

Now given the following code:

pub const fn new_safe(value: $internal) -> Self {
    assert!(MIN <= MAX);
    if !(MIN <= value && value <= MAX) {
        return Self(MIN);
    }
    Self(value)
}

The compiler just as well deduces that the returned value is between MIN and MAX as with the assume hint above.

I am quite convinced that adding the assume line (unsafe { $crate::assume(MIN <= value && value <= MAX) };) to either new_saturating or new will not enable any new compiler optimizations (at least not in release mode).

And to your final comment – I have repeatedly been trying to find faults in my experiments and this isn't the first time I've dabbled with how LLVM handles value range analysis. If you can find any corner case where the code changes I proposed would not generate exactly the same assembly as your original code, I would be extremely interested in seeing it. But I haven't been able to find one, even though I've tried.

But finally once again the disclaimer – the requirement for the compiler to optimize away the pointless checks from new_saturating depends on the compiler to be able to do value range analysis on this case. Which LLVM can do. But as with all optimizations, there is no hard guarantee that every rust compiler (some of which might not use LLVM) will do it, or do it the same way. So there is no hard guarantee that the checks will not be there – but if value range analysis doesn't work, then your assume will also not allow optimizations as it depends on the exact same mechanism.

src/lib.rs Outdated Show resolved Hide resolved
@nakedible
Copy link
Author

Basically the method I used was that I made a bunch of simple wrapper functions like this:

#[inline(never)]
fn try_expand(v: RangedI32<3, 63>) -> RangedI32<-2, 64> {
    v.expand()
}

And then used cargo asm to dump the function before and after removing the unsafe bit, and ensuring that there was no change in the assembly output:

deranged::try_expand:
 mov     eax, edi
 ret

I might be able to wrap these in some testing setup where it would be possible to automatically dump the assembly of all relevant functions to a file. It's often beneficial to be able to see how the assembly changes when changing the implementation – for example if suddenly there's the addition of a panic handler or something that is unexpected, or a zero runtime cost abstraction ends up having runtime cost.

@jhpratt
Copy link
Owner

jhpratt commented Oct 31, 2023

Doing that would help to an extent, but ultimately I've got to be as certain as possible it's a zero-cost abstraction. deranged is depended upon by time, which gets hundreds of thousands of downloads every day. I am putting quite a bit of effort into optimization nowadays for a few different reasons. Even a small change will have a notable impact.

@nakedible
Copy link
Author

If you ultimately don't want to trust the compiler that it is able to optimize away cases like this, then that's a clear position to take – me attempting to show that the compiler can in fact deduce the checks away is not going to change that.

I might take one more stab at this by formulating the change in a different way, but this second commit will need to be fully rewritten.

@jhpratt
Copy link
Owner

jhpratt commented Feb 10, 2024

Closing per my comment above. I created a trivial case with a demonstrable difference in generated assembly, which is not an acceptable tradeoff.

@jhpratt jhpratt closed this Feb 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants