-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert #94158, "Apply noundef metadata to loads of types that do not permit raw init" #98966
Conversation
…t do not permit raw init" In rust-lang#94158, we started emitting `noundef`, which means that functions returning uninitialized references emit IR with is both `ret void` and also `noundef`: https://godbolt.org/z/hbjsKKfc3 More generally, this change makes `mem::uninitialized` dangerous in a way that it wasn't before. This `noundef` change was shipped in the past 2 stable releases, 1.61.0 and 1.62.0. This concerns me because in rust-lang#66151 we have thus far decided against issuing a panic for creating uninitialized references within arrays, on account of the breakage that shows up in crater. If this pattern is so widely-used that we're not willing to put in a runtime check for it, then it doesn't seem prudent to invite LLVM to exploit this UB. The pattern of creating uninit references in arrays shows up real code because the 0.11, 0.12, and 0.13 release series for `hyper` all use `mem::uninitialized` to construct arrays of `httparse::Header`, which contains a `&str` and `&[u8]`. There is no patch available within these release series, so a `cargo update` is not a viable way to escape the UB here. Also note that the 0.11 release series of `hyper` predates `MaybeUninit`, so any source-level patch for that will incur runtime overhead. Which would be unfortunate, but preferable to UB. I discussed this with @thomcc on the community Discord, and we think that it would be prudent to revert this introduction of `noundef` until we have a mitigation in place for the UB that this may unleash. We haven't been able to cook up any examples of surprising optimizations due to this pattern, but the whole point of `noundef` is to enable optimizations, and we know that there is code which uses it in a way which is definitely instant UB and which we have declined to inform users of. If possible, we would like to see `freeze` applied to the return value of `mem::uninitialized` as a mitigation for this problem. That may be able to keep old code functioning without introducing a performance hit. @rustbot labels add +T-compiler +I-compiler-nominated
(rust-highfive has picked a reviewer for you, use r? to override) |
define void @example_oof([2 x i8*]* noalias nocapture noundef readnone sret([2 x i8*]) dereferenceable(16) %0) {
ret void
} That example doesn't contain UB-- This example does contain UB, because of the load (https://godbolt.org/z/P5cn4434T): pub fn oof() -> &'static u8 {
let refs: [&'static u8; 2] = unsafe { core::mem::uninitialized() };
refs[0]
} define noundef nonnull align 1 dereferenceable(1) i8* @oof() {
ret i8* undef
} In the optimized IR, it's UB for 2 reasons:
(you can play with this in Alive2, e.g. https://alive2.llvm.org/ce/z/rwmt_r -- if the function can be replaced with The original load in the unoptimized IR looks like (https://godbolt.org/z/c3bnPfa97): %2 = load i8*, i8** %1, align 8, !nonnull !2, !align !3, !noundef !2 This is UB for 1 reason:
So then the question is--does The usual pattern of: let foo = mem::uninitialized();
// initialize undef fields
foo.x = 1;
foo.y = 2;
// use foo
// ... should not have any problematic loads, even if the uninitialized value is returned through multiple other functions first. ( If there is a load, doing almost anything with such a loaded reference except storing it (i.e. passing, returning, taking a reference to inner fields, god forbid actually dereferencing, etc.), has been UB since Rust 1.0. I agree that we should implement (Not all of it though-- |
In short: We don't apply This may be (likely is?) still a concern if libraries are loading such *except those which inherit the ABI from their single field, e.g. https://godbolt.org/z/7e3vs3Wx1, but |
It is good to hear this scenario is less dangerous than I feared.
No, but it creates references to them, which we apply
To be clear, my concern is much more the inconsistency in the decision to not turn this pattern into a panic because of the breakage in crater, but to walk this pattern towards being dangerous UB. I do not mind being corrected that I'm just misunderstanding the situation. |
r? @nikic |
The
FWIW in practice using freeze in this context is equivalent to using zero initialization. That would effectively be #87032. Haven't reread the discussion there, so not sure why it did not land. (Edit: And to be clear, I would be very much in favor of a change along those lines.) Don't really have anything more to add here, I think @erikdesjardins already explained very well why the premise of this PR is to the most part incorrect. |
|
I mean, this is possible in the sense that any change can perturb optimization behavior in unexpected ways. But we don't intentionally optimize undef values worse than zero values -- if such cases exist, we would classify that as a missed optimization opportunity and be happy to fix it (fix = exploit more UB in this context). |
To echo @nikic--this seems to be making an assumption about the way LLVM works that is not really accurate. Optimizations are generally written as if (As nikic says, it's always possible that an optimization can behave differently for As a trivial example of this: division by zero is UB in LLVM IR. Both |
I have certainly seen examples in the past where replacing |
If you have an example I'll fix it upstream ;) I should've been clearer though. I want to put much more weight on "staying with undef is a bad solution" than "zero in particular is a better solution". (I personally like My point is that upstream philosophy says undef/poison should be optimized at least as much as any constant. If that's not the case, it's an accident, so we should not be depending on a lack of optimizations for undef. In particular I want to push back against the following:
It is exploited in many ways! And more are coming with LLVM 15, with improvements to branch-on-undef UB, etc. |
I doubt it still reproduces, but #52898 says the very first example used to SIGILL with |
@RalfJung Thanks, this still reproduces on the IR level: https://llvm.godbolt.org/z/dK9rz73ad There's multiple ways in which this could be optimized here, but I believe the main culprit here is mem2reg: https://llvm.godbolt.org/z/eK9Mdvx3T It places a nonnull assumption for the zero load, but not the undef load. From a brief look at the code, addAssumeNonNull (https://github.com/llvm/llvm-project/blob/f44d28f840c0b0877b09d5547fd09e191bbdc90e/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp#L301) is not consistently called for replacements with undef values. Should be easy to fix. |
After looking a bit closer, the generic mem2reg code was already handling this correctly, it's just the optimized single-block implementation that failed to preserve the nonnull assume. Fixed in llvm/llvm-project@3d475df. Thanks for sharing the example! |
Does LLVM say it's instant UB to do a If so, can that be checked in memory sanitizer or similar? msan only seems to check branches on undef, but if llvm is going to use "this cannot be null, it's uninit, therefore it's UB" to optimize on, then that should abort under msan as well. |
@nikic thanks for the quick fix. :) |
I would propose the following alternative to this PR: for the cases where |
MSan has an "eager checks" option which checks This option currently doesn't handle noundef loads, but could be extended in that direction. |
Yeah, I think that'd be great. I did look online for if such a thing existed, but couldn't find anything. I'd be happy to try and help get that enabled by default in rustc (but more discussion of that should probably be in its own issue). |
I opened #99179 to track. |
PR for that is up at #99182. |
Discussed in T-compiler meeting. I think the questions raised here should be resolved in the T-lang meeting, so I'm dropping the compiler nomination and replacing it with a T-lang nomination. (Also, any discussion of the questions posed in this PR should include note of PR #99182 as one (partial, IIUC) mitigation of the problem described here.) @rustbot label: +I-lang-nominated -I-compiler-nominated |
We discussed this in today's @rust-lang/lang meeting. We felt that the proposed mitigation in With that change made, we're likely fine with not doing this revert, assuming other issues don't arise. However, one non-consensus to explicitly note: there was no consensus to treat this as precedent to reject potential future changes, such as potentially using |
mem::uninitialized: mitigate many incorrect uses of this function Alternative to rust-lang#98966: fill memory with `0x01` rather than leaving it uninit. This is definitely bitewise valid for all `bool` and nonnull types, and also those `Option<&T>` that we started putting `noundef` on. However it is still invalid for `char` and some enums, and on references the `dereferenceable` attribute is still violated, so the generated LLVM IR still has UB -- but in fewer cases, and `dereferenceable` is hopefully less likely to cause problems than clearly incorrect range annotations. This can make using `mem::uninitialized` a lot slower, but that function has been deprecated for years and we keep telling everyone to move to `MaybeUninit` because it is basically impossible to use `mem::uninitialized` correctly. For the cases where that hasn't helped (and all the old code out there that nobody will ever update), we can at least mitigate the effect of using this API. Note that this is *not* in any way a stable guarantee -- it is still UB to call `mem::uninitialized::<bool>()`, and Miri will call it out as such. This is somewhat similar to rust-lang#87032, which proposed to make `uninitialized` return a buffer filled with 0x00. However - That PR also proposed to reduce the situations in which we panic, which I don't think we should do at this time. - The 0x01 bit pattern means that nonnull requirements are satisfied, which (due to references) is the most common validity invariant. `@5225225` I hope I am using `cfg(sanitize)` the right way; I was not sure for which ones to test here. Cc rust-lang#66151 Fixes rust-lang#87675
#99182 has landed, so this one should probably be closed? |
👍 Sounds good to me |
mem::uninitialized: mitigate many incorrect uses of this function Alternative to rust-lang/rust#98966: fill memory with `0x01` rather than leaving it uninit. This is definitely bitewise valid for all `bool` and nonnull types, and also those `Option<&T>` that we started putting `noundef` on. However it is still invalid for `char` and some enums, and on references the `dereferenceable` attribute is still violated, so the generated LLVM IR still has UB -- but in fewer cases, and `dereferenceable` is hopefully less likely to cause problems than clearly incorrect range annotations. This can make using `mem::uninitialized` a lot slower, but that function has been deprecated for years and we keep telling everyone to move to `MaybeUninit` because it is basically impossible to use `mem::uninitialized` correctly. For the cases where that hasn't helped (and all the old code out there that nobody will ever update), we can at least mitigate the effect of using this API. Note that this is *not* in any way a stable guarantee -- it is still UB to call `mem::uninitialized::<bool>()`, and Miri will call it out as such. This is somewhat similar to rust-lang/rust#87032, which proposed to make `uninitialized` return a buffer filled with 0x00. However - That PR also proposed to reduce the situations in which we panic, which I don't think we should do at this time. - The 0x01 bit pattern means that nonnull requirements are satisfied, which (due to references) is the most common validity invariant. `@5225225` I hope I am using `cfg(sanitize)` the right way; I was not sure for which ones to test here. Cc rust-lang/rust#66151 Fixes rust-lang/rust#87675
In #94158, we started emitting
noundef
, which means that functions returning uninitialized references emit IR with is bothret void
and alsonoundef
: https://godbolt.org/z/hbjsKKfc3More generally, this change makes
mem::uninitialized
dangerous in a way that it wasn't before. Thisnoundef
change was shipped in the past 2 stable releases, 1.61.0 and 1.62.0.This concerns me because in #66151 we have thus far decided against issuing a panic for creating uninitialized references within arrays, on account of the breakage that shows up in crater. If this pattern is so widely-used that we're not willing to put in a runtime check for it, then it doesn't seem prudent to invite LLVM to exploit this UB.
The pattern of creating uninit references in arrays shows up real code because the 0.11, 0.12, and 0.13 release series for
hyper
all usemem::uninitialized
to construct arrays ofhttparse::Header
, which contains a&str
and&[u8]
. There is no patch available within these release series, so acargo update
is not a viable way to escape the UB here. Also note that the 0.11 release series ofhyper
predatesMaybeUninit
, so any source-level patch for that will incur runtime overhead. Which would be unfortunate, but preferable to UB.I discussed this with @thomcc on the community Discord, and we think that it would be prudent to revert this introduction of
noundef
until we have a mitigation in place for the UB that this may unleash. We haven't been able to cook up any examples of surprising optimizations due to this pattern, but the whole point ofnoundef
is to enable optimizations, and we know that there is code which uses it in a way which is definitely instant UB and which we have declined to inform users of.If possible, we would like to see
freeze
applied to the return value ofmem::uninitialized
as a mitigation for this problem. That may be able to keep old code functioning without introducing a performance hit.@rustbot labels add +T-compiler +I-compiler-nominated