-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unnecessary uses of unsafe
#11
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first commit is a no-brainer, and something I didn't think about when I added new_static
. The second commit I'm more skeptical of. You say that it's been tested; can you share those? I suspect that there are edge cases that haven't been considered.
src/lib.rs
Outdated
#[inline(always)] | ||
pub(crate) const fn new_saturating(value: $internal) -> Self { | ||
<Self as $crate::traits::RangeIsValid>::ASSERT; | ||
Self(if value < MIN { | ||
MIN | ||
} else if value > MAX { | ||
MAX | ||
} else { | ||
value | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This removes the call to assume
, which could reasonably be expected to result in worse optimization for downstream crates. That's why new_unchecked
was used in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to prove to you that if you do this if
comparison, it is "as good" as your assume thing.
Here's a trivial godbolt showing that the compiler can elide bounds checking (as evidenced by not having panic as an option for the second functions) if there is an if
where it can deduce that the value must be within bounds: https://godbolt.org/z/nK1s7zTGd
Again, I make no guarantees that every version of every Rust compiler on every platform in every situation makes this optimization – but it's pretty basic and necessary optimization for the elimination of array bounds checking in many cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[inline(never)]
fn try_bounds(a: i32) -> i32 {
let arr: [i32; 61] = core::array::from_fn(|i| i as i32);
let ind = RangedI32::<3, 63>::new_saturating(a);
arr[(ind.get()-3) as usize]
}
results in:
deranged::try_bounds:
sub rsp, 120
movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_0]
movaps xmmword, ptr, [rsp, -, 128], xmm0
movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_1]
movaps xmmword, ptr, [rsp, -, 112], xmm0
movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_2]
movaps xmmword, ptr, [rsp, -, 96], xmm0
movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_3]
movaps xmmword, ptr, [rsp, -, 80], xmm0
movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_4]
movaps xmmword, ptr, [rsp, -, 64], xmm0
movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_5]
movaps xmmword, ptr, [rsp, -, 48], xmm0
movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_6]
movaps xmmword, ptr, [rsp, -, 32], xmm0
movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_7]
movaps xmmword, ptr, [rsp, -, 16], xmm0
movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_8]
movaps xmmword, ptr, [rsp], xmm0
movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_9]
movaps xmmword, ptr, [rsp, +, 16], xmm0
movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_10]
movaps xmmword, ptr, [rsp, +, 32], xmm0
movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_11]
movaps xmmword, ptr, [rsp, +, 48], xmm0
movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_12]
movaps xmmword, ptr, [rsp, +, 64], xmm0
movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_13]
movaps xmmword, ptr, [rsp, +, 80], xmm0
movaps xmm0, xmmword, ptr, [rip, +, .LCPI20_14]
movaps xmmword, ptr, [rsp, +, 96], xmm0
mov dword, ptr, [rsp, +, 112], 60
cmp edi, 3
jge .LBB20_2
xor eax, eax
.LBB20_4:
mov eax, dword, ptr, [rsp, +, 4*rax, -, 128]
add rsp, 120
ret
.LBB20_2:
mov eax, 60
cmp edi, 63
ja .LBB20_4
add edi, -3
mov rax, rdi
mov eax, dword, ptr, [rsp, +, 4*rax, -, 128]
add rsp, 120
ret
As you can see, no jumps to panic, so the compiler has decided that the array access is in bounds. If the array is changed to be 60 elements long, the jump to panic handler is generated by the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_saturating
does not use assume
directly, but it does within the internal constructor. Testing that is likely not straightforward. I will note that the example you gave tests the behavior of get
, not new_saturating
. That is where the bounds elision is coming from. As to the godbolt, I am well aware that the compiler is able to remove checks if it's statically known. That is basic optimization.
The reason new_saturating
uses the unsafe
constructor is because it is, in fact, unsafe. A value is being assigned to the inner field, which must be unsafe to initialize or mutate. Until my unsafe fields RFC lands, I created the submodule specifically to localize any code that doesn't require unsafe
for technical reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_saturating
does not useassume
directly, but it does within the internal constructor. Testing that is likely not straightforward. I will note that the example you gave tests the behavior ofget
, notnew_saturating
. That is where the bounds elision is coming from.
You are right, the get()
call is included, I missed that when writing that small example. However, the assembly is 100% the same if I comment out the unsafe assume part in get()
. The bounds elision is also coming directly from new_saturating
.
As to the godbolt, I am well aware that the compiler is able to remove checks if it's statically known. That is basic optimization.
The godbolt was meant to mimic the code written in this library for the "new saturating" constructor and then accessing the value. You should be able to see that the usage is analogous.
The reason new_saturating uses the unsafe constructor is because it is, in fact, unsafe. A value is being assigned to the inner field, which must be unsafe to initialize or mutate.
Yet your new
constructor does exactly the same value initialization:
/// Creates a ranged integer if the given value is in the range
/// `MIN..=MAX`.
#[inline(always)]
pub const fn new(value: $internal) -> Option<Self> {
<Self as $crate::traits::RangeIsValid>::ASSERT;
if value < MIN || value > MAX {
None
} else {
Some(Self(value))
}
}
There's no mention of unsafe there. If you'd want to have the assume be a part of that too, you'd write:
/// Creates a ranged integer if the given value is in the range
/// `MIN..=MAX`.
#[inline(always)]
pub const fn new(value: $internal) -> Option<Self> {
<Self as $crate::traits::RangeIsValid>::ASSERT;
if value < MIN || value > MAX {
None
} else {
unsafe { $crate::assume(MIN <= value && value <= MAX) };
Some(Self(value))
}
}
But, like already said, I'm convinced that the compiler will deduce the exact same bounds from the if clause even without the assume line. I'm not sure if there's some IR that LLVM can output to show that.
Until my unsafe fields RFC lands, I created the submodule specifically to localize any code that doesn't require unsafe for technical reasons.
Great work, hope it will land soon. And I'm biting my lip that bounded integers in some format land some day as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet your
new
constructor does exactly the same value initialization:
That is an oversight due to how I originally wrote the code. It absolutely should call new_unchecked
.
I have also confirmed that new_saturating
is not a drop-in replacement for new_unchecked
:
use deranged::RangedUsize;
/// Safety: `x` must be in the range `0..=3`.
pub unsafe fn foo(x: usize) -> (usize, RangedUsize<0, 3>) {
// let r = unsafe { RangedUsize::new_unchecked(x) };
let r = RangedUsize::new_saturating(x);
let arr = [205, 37, 173, 9];
let el = arr[x];
(el, r)
}
Changing how r
is defined, as I expected, allows the compiler to remove the potential panic (with the result being UB due to violating safety invariants). As with most things research, it's necessary to try to prove yourself wrong rather than trying to confirm your beliefs. That generally goes quite a bit farther in my experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is an oversight due to how I originally wrote the code. It absolutely should call
new_unchecked
.
Awesome that I was able to spot a bug, then :-) I assumed that that code was exactly how you wanted it, so the new_saturating
code wasn't significantly different.
I have also confirmed that
new_saturating
is not a drop-in replacement fornew_unchecked
:
Yeah, I never claimed that it would have the same effect – I only claimed that in the usages I put it in, there is no difference. Explaining in accurate details so we may find where the difference in opinion is...
use deranged::RangedUsize;
/// Safety: `x` must be in the range `0..=3`.
pub unsafe fn foo(x: usize) -> (usize, RangedUsize<0, 3>) {
// let r = unsafe { RangedUsize::new_unchecked(x) };
let r = RangedUsize::new_saturating(x);
let arr = [205, 37, 173, 9];
let el = arr[x];
(el, r)
}
This function set r
to be the new_saturating
version of x
. This means that if x
is for example 16, then r
is 3. The code then indexes the array with x
, which means that it will panic with that given value of x
.
If we change it to your unsafe
version, then the compiler will assume, based on the assume
call in new_unchecked
that x
is in fact in the range 0..=3
. This means that given a value of 16, this call will be unsound, as it indexes past the end of the array.
This is working exactly as it should be – you need the unsafe assume
in order to make the compiler create unsound code.
However, what I'm saying is that the compiler can deduce that the value r
is in fact 0..=3
, so if we use the value r
instead of x
, the generated assembly code will not have a panic handler.
Now, in order demonstrate this, I made the internal type of RangedUsize
be public – so we can bypass the get()
method entirely.
Example one:
#[inline(never)]
/// Safety: `x` must be in the range `0..=3`.
fn foo(x: usize) -> (usize, RangedUsize<0, 3>) {
// let r = unsafe { RangedUsize::new_unchecked(x) };
let r = RangedUsize::new_saturating(x);
let arr = [205, 37, 173, 9];
let el = arr[x];
(el, r)
}
deranged::foo:
sub rsp, 40
mov qword, ptr, [rsp, +, 8], 205
mov qword, ptr, [rsp, +, 16], 37
mov qword, ptr, [rsp, +, 24], 173
mov qword, ptr, [rsp, +, 32], 9
cmp rdi, 3
ja .LBB21_2
mov rax, qword, ptr, [rsp, +, 8*rdi, +, 8]
mov rdx, rdi
add rsp, 40
ret
.LBB21_2:
lea rdx, [rip, +, .L__unnamed_10]
mov esi, 4
call qword, ptr, [rip, +, _ZN4core9panicking18panic_bounds_check17hb0d791a602836d16E@GOTPCREL]
ud2
We can see that the panic handler is included.
Example two:
#[inline(never)]
/// Safety: `x` must be in the range `0..=3`.
fn foo(x: usize) -> (usize, RangedUsize<0, 3>) {
let r = unsafe { RangedUsize::new_unchecked(x) };
// let r = RangedUsize::new_saturating(x);
let arr = [205, 37, 173, 9];
let el = arr[x];
(el, r)
}
deranged::foo:
mov qword, ptr, [rsp, -, 32], 205
mov qword, ptr, [rsp, -, 24], 37
mov qword, ptr, [rsp, -, 16], 173
mov qword, ptr, [rsp, -, 8], 9
mov rax, qword, ptr, [rsp, +, 8*rdi, -, 32]
mov rdx, rdi
ret
We can see that the panic check is no longer included, but the function is unsound if given values outside 0..=3
. So now the new example.
Example three:
#[inline(never)]
fn foo(x: usize) -> (usize, RangedUsize<0, 3>) {
// let r = unsafe { RangedUsize::new_unchecked(x) };
let r = RangedUsize::new_saturating(x);
let arr = [205, 37, 173, 9];
let el = arr[r.0];
(el, r)
}
deranged::foo:
cmp rdi, 3
mov edx, 3
cmovb rdx, rdi
mov qword, ptr, [rsp, -, 32], 205
mov qword, ptr, [rsp, -, 24], 37
mov qword, ptr, [rsp, -, 16], 173
mov qword, ptr, [rsp, -, 8], 9
mov rax, qword, ptr, [rsp, +, 8*rdx, -, 32]
ret
So now we used r.0
instead of x
to index the array. The resulting code includes the comparison caused by new_saturating
, just like example 1, but it doesn't contain a panic handler, because the compiler knows the value is in the range 0..=3
. And this code will not be unsound.
To further illustrate how the reasoning of the compiler is exactly the same, let me undress the macros in new_unchecked
and write it out:
pub const unsafe fn new_unchecked(value: $internal) -> Self {
assert!(MIN <= MAX);
if !(MIN <= value && value <= MAX) {
core::hint::unreachable_unchecked();
}
Self(value)
}
From this code, the compiler can deduce that value
is between MIN
and MAX
.
Now given the following code:
pub const fn new_safe(value: $internal) -> Self {
assert!(MIN <= MAX);
if !(MIN <= value && value <= MAX) {
return Self(MIN);
}
Self(value)
}
The compiler just as well deduces that the returned value is between MIN
and MAX
as with the assume hint above.
I am quite convinced that adding the assume
line (unsafe { $crate::assume(MIN <= value && value <= MAX) };
) to either new_saturating
or new
will not enable any new compiler optimizations (at least not in release mode).
And to your final comment – I have repeatedly been trying to find faults in my experiments and this isn't the first time I've dabbled with how LLVM handles value range analysis. If you can find any corner case where the code changes I proposed would not generate exactly the same assembly as your original code, I would be extremely interested in seeing it. But I haven't been able to find one, even though I've tried.
But finally once again the disclaimer – the requirement for the compiler to optimize away the pointless checks from new_saturating
depends on the compiler to be able to do value range analysis on this case. Which LLVM can do. But as with all optimizations, there is no hard guarantee that every rust compiler (some of which might not use LLVM) will do it, or do it the same way. So there is no hard guarantee that the checks will not be there – but if value range analysis doesn't work, then your assume
will also not allow optimizations as it depends on the exact same mechanism.
Basically the method I used was that I made a bunch of simple wrapper functions like this: #[inline(never)]
fn try_expand(v: RangedI32<3, 63>) -> RangedI32<-2, 64> {
v.expand()
} And then used deranged::try_expand:
mov eax, edi
ret I might be able to wrap these in some testing setup where it would be possible to automatically dump the assembly of all relevant functions to a file. It's often beneficial to be able to see how the assembly changes when changing the implementation – for example if suddenly there's the addition of a panic handler or something that is unexpected, or a zero runtime cost abstraction ends up having runtime cost. |
Doing that would help to an extent, but ultimately I've got to be as certain as possible it's a zero-cost abstraction. |
If you ultimately don't want to trust the compiler that it is able to optimize away cases like this, then that's a clear position to take – me attempting to show that the compiler can in fact deduce the checks away is not going to change that. I might take one more stab at this by formulating the change in a different way, but this second commit will need to be fully rewritten. |
Closing per my comment above. I created a trivial case with a demonstrable difference in generated assembly, which is not an acceptable tradeoff. |
The compiler is smart in removing comparisons that it already knows are not true. This allows us to reduce the use of
unsafe
in safe functions without affecting the produced code.Each one of these code changes has been tested to not produce any difference in the generated assembly in a release build (with the input given as
std::hint::black_box
so the compiler won't optimize on the static value given in code). Of course no guarantees can be given that every Rust compiler on every platform always will make the same optimizations.I consider
new_saturating
to be a fundamental and useful method to expose publicly, but I left it here aspub(crate)
as to not change the public API of the library.