-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating a reference with std::mem::zeroed
generates a SIGILL
#52898
Comments
As soon as you instantiate a type (including references) with a stale value it will immediately cause UB. This is not really actionable (mem::uninitialized itself is not very sound, there are proposals to use unions instead). |
Yes, this code is UB - references must be non-null. |
I can see how things like |
You can use If you want to, you can also wrap them in a struct to provide safe access, perhaps panicking if the value hasn't yet been initialized. If you have suggestions as to how to improve ergonomics or clarity here (docs or through language changes) we'd be happy to hear them. |
What about when working with C function pointers? I expected these to behave more like raw pointers, but it's causing this same
|
You should be using The nomicon notes:
|
At the time that I wrote the |
One extra thing which I'd like to emphasize here is that this is an incredibly silly footgun when dealing with generic code. Yes, accessing something that is In fact, if the official stance is that initializing use std::collections::HashMap;
fn main() {
let mut map: HashMap< (), &u32 > = HashMap::new();
map.insert( (), &123 );
for (_, v) in map.drain() {
println!( "{}", v );
}
} Here's how the impl<'a, K, V> Iterator for Drain<'a, K, V> {
// ...
#[inline]
fn next(&mut self) -> Option<(SafeHash, K, V)> {
self.iter.next().map(|raw| {
unsafe {
self.table.as_mut().size -= 1;
let (k, v) = ptr::read(raw.pair());
(SafeHash { hash: ptr::replace(&mut *raw.hash(), EMPTY_BUCKET) }, k, v)
}
})
}
// ...
} And here's #[inline]
#[stable(feature = "rust1", since = "1.0.0")]
pub unsafe fn read<T>(src: *const T) -> T {
let mut tmp: T = mem::uninitialized();
copy_nonoverlapping(src, &mut tmp, 1);
tmp
} So what's effectively doing is this: let (k, v): ((), &'static u32) = mem::uninitialized(); In general my assumptions were always:
The docs for The docs for |
I'll reopen this and cc people who are working on uninitalized/zeroed. cc @RalfJung @rust-lang/wg-unsafe-code-guidelines |
uninitialized is very different from zeroed in this context. A null reference is explicitly a value that doesn't exist, and creating one is undefined behavior. |
@sfackler Uninitialized memory is also assumed to be a value that doesn't exist, and both are equally undefined behavior. If anything, zeroed memory is more likely to trigger a More importantly, as @koute mentioned, this zeroed memory is not at any point being accessed in these use cases. It's being overwritten before any use, which should pretty much eliminate any undefined behavior. Regardless, there would be no point in providing Why not just rewrite these functions to
|
No, that is not correct. Uninitialized memory exists all over the place! The memory returned by malloc is uninitialized. Uninitialized memory has an undefined value.
All such code is not forbidden. |
But |
There is no tricking going on. The compiler is well aware that the value is uninitialized. |
@sfackler, @daggerbot - while the argument is sound, at the moment we have a significant amount of projects for which nightly is unusable - not mentioning that there might be pressure to push this change to stable. Would it be reasonable to revert the change that caused this behaviour and then continue the discussion until a proper solution is found? |
If this were the case, then the following code would fail to build:
But it doesn't, because the compiler thinks
This does not build because of an error stating |
The documentation for mem::uninitialized has some extensive discussion about what you can and can not do with a value returned by it. The explicit purpose of mem::uninitialized is to bypass that check. It is an unsafe function, and the entire purpose of |
All you need to do to make this code valid is to use mem::uninitialized rather than mem::zeroed: |
Did you not read the above comments? The value is being replaced with a valid value before it's ever accessed. Roughly equivalent example.
Edit: The docs say that |
That example is totally fine and well defined. This, on the other hand, is still UB: let mut x: &'static u8 = mem::zeroed();
x = &0;
do_a_thing(&x); mem::zeroed is not the same thing as mem::uninitialized. |
What implementation detail makes |
@kevinmehall pointed out an explanation:
So it looks like Edit: It sounds like this optimization may have been triggered by LLVM if rustc recently switched to a later version. Either the docs should be updated to indicate what |
References are defined to always point to a valid location, but in particular that location is never null. @rkruppe mentioned on the lang channel of the rust Discord that I may be wrong about uninitialized being valid in this case - can an optimizer select a value of 0 for an undefined value that's tagged as nonzero? |
Why can't it be equivalent to that? memset is understood by LLVM IIRC and it would realize that the value has been zeroed. |
In most cases, it would appear on the surface that it is equivalent, but in some cases (such as in x11-dl), it gets optimized down to an instruction that raises |
Why would it not have been raised? We instruct LLVM to generate an abort inside of unreachable code, and code that invokes UB (like zeroing a non-zeroable value) is assumed to be unreachable. |
It was because of the implied As you said above, One of these functions is only required because constructing these enormous structs normally (e.g. Before Rust I primarily used C where it's common to |
Were those function pointers wrapped in |
With `uninitialized`, it won't matter if `Option` is used. Function
pointers are similar to reference types and to the best of my knowledge
subject to the same requirements, in particular that they must be aligned.
I don't think it's valid to create an uninitialized one---you'd need to
store them as raw pointers (`*const ()` on platforms that support it as
there are apparently no raw function pointers?) to avoid these issues.
The compiler changing how it compiles code with UB isn't considered a
breaking change: there are no guarantees whatsoever about how code with UB
will behave. It's possibly a mistake that it's poorly documented that
misaligned function pointers are UB (even the current version of the UCG
doesn't seem to address it), but that seems to me a docs bug, not a
compiler bug, especially given that it is documented that they cannot be
null.
…On Thu., Jul. 18, 2019, 13:42 Peter Atashian, ***@***.***> wrote:
just pointers to C functions
Were those function pointers wrapped in Option? If not, then the value
zero is a trap representation for them.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#52898?email_source=notifications&email_token=AE7AOVLBMNWNS4UFYYRA5DDQACTPVA5CNFSM4FM7AGW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2JHZNY#issuecomment-512916663>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE7AOVOC2QOBLEOOEX5AEPTQACTPVANCNFSM4FM7AGWQ>
.
|
Actually, I just thought of something else: even if misaligned function
pointers aren't well documented as UB, replacing `zeroed` with
`uninitialized` is never going to be correct---the compiler can happily
assume that the value returned from `uninitialized` is going to be
all-zeroes, which is enough to trigger UB on its own.
…On Thu., Jul. 18, 2019, 14:27 Alexis Hunt, ***@***.***> wrote:
With `uninitialized`, it won't matter if `Option` is used. Function
pointers are similar to reference types and to the best of my knowledge
subject to the same requirements, in particular that they must be aligned.
I don't think it's valid to create an uninitialized one---you'd need to
store them as raw pointers (`*const ()` on platforms that support it as
there are apparently no raw function pointers?) to avoid these issues.
The compiler changing how it compiles code with UB isn't considered a
breaking change: there are no guarantees whatsoever about how code with UB
will behave. It's possibly a mistake that it's poorly documented that
misaligned function pointers are UB (even the current version of the UCG
doesn't seem to address it), but that seems to me a docs bug, not a
compiler bug, especially given that it is documented that they cannot be
null.
On Thu., Jul. 18, 2019, 13:42 Peter Atashian, ***@***.***>
wrote:
> just pointers to C functions
>
> Were those function pointers wrapped in Option? If not, then the value
> zero is a trap representation for them.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#52898?email_source=notifications&email_token=AE7AOVLBMNWNS4UFYYRA5DDQACTPVA5CNFSM4FM7AGW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2JHZNY#issuecomment-512916663>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AE7AOVOC2QOBLEOOEX5AEPTQACTPVANCNFSM4FM7AGWQ>
> .
>
|
While I agree that If I had written this library today, I would have known to look for a better solution, but unfortunately I wrote it ~5 years ago or so just after Rust 1.0 was released. What can possibly be done from my end? Yank the library and tell the users they'll have to write a new one? Redesign and rewrite it and force everyone to rewrite their usage anyway? What will happen next time there's an unnecessary change that breaks people's code other than wish I had used a stable language, rather than some experimental new language that gets falsely touted as being reliable? |
You were doing something you weren't allowed to do. Previously, the compiler didn't notice, but now it does. You're only getting caught now, but your code was incorrect all the way back to Rust 1.0. That the bug was only encountered now is unfortunate, but compiler changes aren't the only thing that could break this code. It might break if you run it on a different platform. It might break if you compile with different flags, even in an older version of Rust. It might break if you use the code in just the wrong way that nobody has yet, again even in an older version of Rust. The compiler changed in some way which broke the specific examples complained of, but there is not and never was a guarantee that your code was operating correctly. Indeed, it may well only have appeared to be operating correctly but actually had more subtle bugs. It may have been corrupting memory in a way which you didn't notice. It may have led the compiler to, in some cases, optimize away a null pointer check that ought to have been there, causing subtle misbehaviour. Even on versions of the compiler where this "worked", there is no guarantee that it was in fact doing what you thought it was. You say you wish you had used a "stable language". Would you consider C to be stable? You'd run into the same issues there; nothing to do with language stability. Chris Lattner (who wrote LLVM) wrote a series of blog posts about UB back in 2011 (I linked the second post, because it's the most relevant here). I don't know of any C or C++ compiler that guarantees that code performing undefined behaviour will not change how it behaves: the major ones (clang, GCC, MSVC) certainly do exactly the opposite (except where they have extensions defining behaviour in some cases where the standards would call it UB). What should you do? You should stop shipping a library that is insecure and buggy. I don't know anything about your code so I can't tell you how to accomplish that. Ideally you can change the types. If they're in the public interface, and you can make appropriate changes without changing your public interface, go do that. Failing all else, make dummy functions that panic when called and initialize to point to them---presumably you don't care about what the actual values were, so using trapping dummies is fine. And you'll get an extra benefit of improved security in case something does try to call these values: trapping rather than setting the instruction pointer to an effectively random address. Edit: and if for some reason that doesn't work, do the only thing you can do: make a breaking change and tell all your users that they have a security bug and need to adjust. It's the right thing to do---so much so that security bugs are an accepted reason to break backwards compatibility by many major libraries and compilers. |
Nothing that this library does would be an issue in C or any other systems language that I'm aware of. In fact, I've worked with code many times before in C and C++ that does almost exactly the same thing without issue (like SDL does after it decides whether to load the X11 or Wayland libraries). If Rust is intended to interface with C code, it's doesn't seem right that it's even more unpredictable than C when doing so. Unsafe code should mean "you have the power to shoot yourself in the foot," not "we'll go ahead and shoot your foot for you." All the library is trying to do is initialize a big struct, and the compiler keeps coming up with new ways to make this impossible every 6 months. |
Ahh. I can see how you arrived at this. There's currently no good language facility for doing this, other than on a field-by-field basis, especially for function pointers. But there are other workarounds which don't rely on UB. For instance, you could add another struct where each field is just a
I think you're conflating two things here, namely "what is UB" and "what are the consequences of UB". You're complaining that the compiler has gone and shot your foot, but that's not true. You already shot yourself in the foot. It just took a long time for you to feel the pain. The equivalent situation in C to the one at hand, as far as breaking changes goes, is not having a pointer to function with an uninitialized value. C permits that; Rust does not. An equivalent in C might be signed integer overflow: it's not obviously UB, it often happens to work, and most of the bugs that arise are due to the compiler optimizing around it rather than an explicit trap or obvious misbehaviour. Additionally, it's defined behaviour in Rust (although it depends on debug/release mode), so a Rust program can use signed overflow predictably and reliably, while a C program that has it is inherently buggy, regardless of whether it happens to work sometimes. |
You are making several unfair and unfounded assumptions here. First of all you seem to think this change was unnecessary. I am not sure why you think that; the change that likely triggered this is #62150 in which we were able to remove a very crazy intrinsic and define it in plain Rust instead. That's a great improvement in terms of maintainability of the compiler and also the language spec (once such a thing exists). Second you seem to think that we knew the change in question would have any kind of fall-out. Again that's not correct, we had do idea. This PR does not change what any of the involved functions actually does: And finally, it is also not correct that we don't care about when our changes do break libraries, even if those libraries have undefined behavior. But if programmers response to undefined behavior is "lol not a bug", there is no basis for a constructive discussion. UB is a contract between the programmer and the compiler (and also, somehow, a contract between the programmer and the compiler developers), and if the programmer does not fulfill their side of the contract, the other party will not fulfill their side of the contract either. So, with that out of the way, maybe we can have a constructive discussion? What happens here is the equivalent of a program doing an out-of-bounds memory access. With one compiler version that might "work" and just return some garbage value, with the next compiler version it might lead to a SEGFAULT because now the access points to an unmapped page. It is basically impossible for a compiler to guarantee that incorrect programs (i.e., programs having UB) keep "working". Any time anything is changed in the optimizer or the implementation of some method used by unsafe code, that can lead to unsafe code with UB "breaking" (I put this in quotes because the code was, in some sense, already broken -- it had UB). I invite you to look at #62150 and tell me what's "wrong" about it and how we could even have foreseen that this would have any kind of fall-out. Note that this kind of thing happens in C all the time. Some program has UB, a compiler update breaks it. In a language where programmers can cause UB, it is unfortunately practically impossible to guarantee any stability for programs with UB. We would have to basically stop developing the language as any seemingly harmless change can have pretty much arbitrary effects.
We could try to work together to find out if and where your code has undefined behavior. If it has no undefined behavior, there is definitely a compiler bug here. If it has undefined behavior, we can try to find out if that can be avoided. If it cannot, that's in some sense a "language bug" -- there probably should be a way to do what you want to do without causing UB, but current Rust just does not support that. Then we can work on trying to fix that language bug. But given that you seem to be willing to use
That's not entirely correct; More importantly though, |
I'm going to take you up on this invitation and say that doing It's still UB on the Rust language level in either case, but it seems like this change goes against how these intrinsics have been used in practice for a long time and it might make sense to delay this kind of change until they have been deprecated for a while. |
Are you saying it would help if we, e.g. changed |
Seems like we are having something of a parallel discussion at amethyst/amethyst#1808 (comment). |
I am going to close this:
|
implement zeroed and uninitialized with MaybeUninit This is the second attempt of doing such a change (first PR: rust-lang#62150). The last change [got reverted](rust-lang#63343) because it [caused](rust-lang#62825) some [issues](rust-lang#52898 (comment)) in [code that incorrectly used these functions](AltF02/x11-rs#99). Since then, the [problematic code has been fixed](AltF02/x11-rs#101), and rustc [gained a lint](rust-lang#63346) that is able to detect many misuses of these functions statically and a [dynamic check that panics](rust-lang#66059) instead of causing UB for some incorrect uses. Fixes rust-lang#62825
implement zeroed and uninitialized with MaybeUninit This is the second attempt of doing such a change (first PR: rust-lang#62150). The last change [got reverted](rust-lang#63343) because it [caused](rust-lang#62825) some [issues](rust-lang#52898 (comment)) in [code that incorrectly used these functions](AltF02/x11-rs#99). Since then, the [problematic code has been fixed](AltF02/x11-rs#101), and rustc [gained a lint](rust-lang#63346) that is able to detect many misuses of these functions statically and a [dynamic check that panics](rust-lang#66059) instead of causing UB for some incorrect uses. Fixes rust-lang#62825
implement zeroed and uninitialized with MaybeUninit This is the second attempt of doing such a change (first PR: rust-lang#62150). The last change [got reverted](rust-lang#63343) because it [caused](rust-lang#62825) some [issues](rust-lang#52898 (comment)) in [code that incorrectly used these functions](AltF02/x11-rs#99). Since then, the [problematic code has been fixed](AltF02/x11-rs#101), and rustc [gained a lint](rust-lang#63346) that is able to detect many misuses of these functions statically and a [dynamic check that panics](rust-lang#66059) instead of causing UB for some incorrect uses. Fixes rust-lang#62825
This program:
when compiled with
opt-level
higher than zero generates a SIGILL. Switching tostd::mem::uninitialized
works around the issue.I've been bitten by this in generic code where I have a field with
ManuallyDrop<T>
(basically I use it as anOption
with manually managed null state).This is possibly related to #52875
Affected versions:
rustc 1.29.0-nightly (874dec2 2018-07-21)
rustc 1.29.0-nightly (54628c8 2018-07-30)
rustc 1.27.2 (58cc626 2018-07-18)
Not affected versions:
rustc 1.26.0 (a775680 2018-05-07)
This doesn't look architecture specific as
x86_64-unknown-linux-gnu
,mips64-unknown-linux-gnuabi64
andarmv7-unknown-linux-musleabihf
are all affected.The text was updated successfully, but these errors were encountered: