Can references to uninhabited types ever be valid? #413

RalfJung · 2023-06-06T20:00:22Z

This discussion presupposes that "Do we have full recursive validity for references" (#412) is answered "no". This issue is only about references to uninhabited types; references to any other type are not relevant here.

So, in general we allow references to point to invalid data. Is the following UB?

let x: &! = transmute(&());

Since there is no recursive validity for references, the answer might be "no". However, the reason we could consider making this UB is that all of the arguments in #412 do not apply: to determine whether an &! is valid, we do not have to check memory, do any potentially racy reads, or anything like that. We can just answer the question with "no". Recursive validity is tricky in part because references can "bounce" between being valid and invalid as the contents of memory change; but for references to uninhabited types, this concern does not apply.

Basically, we can think of ! having an impossible-to-satisfy alignment requirement, and therefore &! is invalid since it is never properly aligned. (I don't really want to fold this into the alignment check, but the point is that we already have the validity of &T depend on T, namely via alignment -- I see no fundamental reason why we couldn't do the same with inhabitedness.)

The text was updated successfully, but these errors were encountered:

digama0 · 2023-06-06T20:45:50Z

Does this issue also cover &Void and &(!, T)? The latter is problematic since I think there are situations where you can have partially uninit references (and this also partially overlaps with #346). But maybe it suffices to say that an initialized &T (i.e. a value returned from a function, like your transmute example) requires that T be inhabited, even though uninitialized or partially initialized places of type &! or &(!, T) can exist.

RalfJung · 2023-06-06T20:53:01Z

Does this issue also cover &Void and &(!, T)?

Yes.

While you can have partially initialized locals of type (!, T), I don't think you can have references to them. Do you have an example for that?

chorman0773 · 2023-06-06T20:55:10Z

Is there a reason why they should be invalid unless &T is invalid if the T is?

RalfJung · 2023-06-06T21:00:37Z

One motivation is being able to replace the entire body of an &self function where self is uninhabited with just unreachable_unchecked(). Back at the last Rust All hands (2019 I think?), someone expressed strong interest in that (code size) optimization.

digama0 · 2023-06-06T21:24:51Z

One motivation is being able to replace the entire body of an &self function where self is uninhabited with just unreachable_unchecked(). Back at the last Rust All hands (2019 I think?), someone expressed strong interest in that (code size) optimization.

This has been mentioned elsewhere and I think you agreed to it, but that is not a very compelling motivation since you can just put match *self {} in the body to get the same optimization. (If I had my druthers match self {} would also work; this does not strictly speaking require a "no" answer to the OP question.)

digama0 · 2023-06-06T21:34:46Z

While you can have partially initialized locals of type (!, T), I don't think you can have references to them. Do you have an example for that?

I don't, at least not without writing an unsafe block that would be UB under the OP question (more or less by definition). A variation that may or may not count is &Box<!>; I'm not sure whether this counts as inhabited or not, and it might actually thread the needle between competing notions of inhabitedness since you can empty-match on it and move out a ! (although maybe you want to call this a safety violation), even though by structural analysis of Box<!> you get a Unique which could just be a dangling pointer.

My main motivation for considering that case specially is that the borrow checker does allow for types like &(T, U) to have more complex initialization states, and future lang proposals might reify these as types (e.g. view types), in which case you might have a function that takes an argument of type &{only T is init} (!, T) and you would not be able to assume that this type is uninhabited.

Diggsey · 2023-06-06T21:40:21Z

One motivation is being able to replace the entire body of an &self function where self is uninhabited with just unreachable_unchecked(). Back at the last Rust All hands (2019 I think?), someone expressed strong interest in that (code size) optimization.

Wouldn't it be almost always trivial to prove that such a function is unreachable anyway? Assuming reachability checks occur post-monomorphisation.

saethlin · 2023-06-06T21:41:26Z

One motivation is being able to replace the entire body of an &self function where self is uninhabited with just unreachable_unchecked(). Back at the last Rust All hands (2019 I think?), someone expressed strong interest in that (code size) optimization.

I feel like we could do an experiment here to figure out if this is actually worth anything?

digama0 · 2023-06-06T21:42:04Z

If it's unreachable, then you wouldn't even need an unreachable_unchecked, the whole function could be deleted. But it could be a field in a trait object, for example, in which case you do need the function (but it could be a dangling pointer? 🤔 )

RalfJung · 2023-06-07T06:44:21Z

This has been mentioned elsewhere and I think you agreed to it, but that is not a very compelling motivation since you can just put match *self {} in the body to get the same optimization. (If I had my druthers match self {} would also work; this does not strictly speaking require a "no" answer to the OP question.)

I'm trying to remember the old discussion. The concern might have been that these are trait impls generated automatically / via proc macros for tons of types. So we can't just add custom code into the body.

I think it was @cramertj who raised this wish back then, not sure if they still remember any details though.

Wouldn't it be almost always trivial to prove that such a function is unreachable anyway? Assuming reachability checks occur post-monomorphisation.

If it's a public function in a lib crate, how would that be trivial?

saethlin · 2023-06-07T17:54:17Z

The perf run for this optimization implemented as a MIR pass indicates no benefit in terms of code size: https://perf.rust-lang.org/compare.html?start=afab3662eb066f05fcdb43c421b72dd19472e752&end=0f4641ea16e650c77a03aa649330702722a5f66f&stat=size%3Alinked_artifact

RalfJung · 2023-06-08T05:56:38Z

I guess the benchmarks don't contain auto-derived code for uninhabited types. Also as a MIR opt this is less effective than it could be, if generic types and their impls are monomorphized for an uninhabited type.

RalfJung · 2023-06-08T05:59:45Z

My main motivation for considering that case specially is that the borrow checker does allow for types like &(T, U) to have more complex initialization states

I have never seen this happen, so this is news to me. What is an example of that?

digama0 · 2023-06-08T06:58:55Z

I take it back, the borrow checker will reason about partially initialized Box<(T, U)> but not &(T, U). However I can come up with examples that would plausibly benefit from it:

#[derive(Default)]
struct Foo {
  a: String,
  ...
  z: String,
}
let mut x = Foo::default();
drop(x.z);
|| println!("{}, ..., {}", x.a, ..., x.y);

Currently, the returned closure captures &x.a, ..., &x.y as references and hence would be 25 pointers large, but a future optimization could capture &x instead with a borrow state indicating that x.z is uninit.

RalfJung · 2023-06-08T09:36:54Z

the borrow checker will reason about partially initialized Box<(T, U)> but not &(T, U).

Ah, yes that makes sense -- Box<T> is treated like T in many regards. I don't see us doing that for references though since it is valid in much fewer cases. And if we ever want to do that, we can just take back the UB on &Uninhabited.

Currently, the returned closure captures &x.a, ..., &x.y as references and hence would be 25 pointers large, but a future optimization could capture &x instead with a borrow state indicating that x.z is uninit.

With per-field closure capture, don't we actually do capture just a single reference and use it for all fields? That was considered as an implementation strategy at some point.

But anyway if we do that, we just have to pick a suitable type for the closure capture environment, one that is not uninhabited. Arguably actually using &(!, i32) for that capture would be a blatant lie and could confuse various parts of the compiler.

chorman0773 · 2023-06-09T02:11:30Z

The concern might have been that these are trait impls generated automatically / via proc macros for tons of types.

I'd expect that a lot of those trait impls will ultimately contain match *self{} as the first item.

use serde::Serialize;
#[derive(Serialize)]
pub enum Void{}

expands to

pub enum Void {}
#[doc(hidden)]
#[allow(non_upper_case_globals, unused_attributes, unused_qualifications)]
const _: () =
    {
        #[allow(unused_extern_crates, clippy :: useless_attribute)]
        extern crate serde as _serde;
        #[allow(unused_macros)]
        macro_rules! try {
            ($__expr : expr) =>
            {
                match $__expr
                {
                    _serde :: __private :: Ok(__val) => __val, _serde ::
                    __private :: Err(__err) =>
                    { return _serde :: __private :: Err(__err) ; }
                }
            }
        }
        #[automatically_derived]
        impl _serde::Serialize for Void {
            fn serialize<__S>(&self, __serializer: __S)
                -> _serde::__private::Result<__S::Ok, __S::Error> where
                __S: _serde::Serializer {
                match *self {}
            }
        }
    };

(I tried with builtin derives, and it just generates core::intrinsics::unreachable(), so apparently we can insert custom code into them in particular.

RalfJung · 2023-06-09T06:13:03Z

Now consider a struct with many fields, one if which is Void.

saethlin · 2023-06-09T15:25:03Z

I guess the benchmarks don't contain auto-derived code for uninhabited types.

No, they do.

Also as a MIR opt this is less effective than it could be, if generic types and their impls are monomorphized for an uninhabited type.

I moved the transform to codegen where we have monomoprhized MIR. It still has no effect, and in a stage2 build of rustc it only fires one additional time.
https://perf.rust-lang.org/compare.html?start=8b35c0bb0f833c0077dc57006eb317edde2a2d1e&end=b7380c57d311609c0fd4f128fbe8adbbe0f96a24&stat=size%3Alinked_artifact

Many of the times this optimization fires, it transforms this:

bb0 {
    _0 = const _;
    return;
}

into

bb0 {
    unreachable;
}

Hardly anything to celebrate.

What we don't have is a highly artificial crate that just derives a bunch of traits on a bunch of types which are all uninhabited. Maybe then this would show up. But that form of argument can justify any optimization so I don't think it is worth considering.

I do not think an argument about the value of this optimization holds up. But the try toolchain(s) are up there, so if someone has a codebase where they can demonstrate it helping, please do so.

RalfJung · 2023-06-10T11:01:30Z

Given a large tuple type with an uninhabited field, we currently do generate a whole bunch of MIR / LLVM IR that is all completely pointless, right? I was told that in the Fuchsia codebase, being able to optimize this is relevant. That's all I know, and it seems plausible. You are saying in our benchmarks we have no such types. Fair, but that doesn't really say much about the code in the wild.

I'm not trusting our ability to come up with real-world benchmarks and predict future coding patterns enough to conclusively say "this optimization is useless and will also never be useful in the future". (I will also note that this kind of argument hasn't been used in other UB discussions before, despite other forms of UB being a lot more subtle, having non-local effects, and having no real-world evidence of being helpful either... are we going to demand real-world optimization evidence for every single bit of UB, and make everything else defined? I'm fine with "here is an optimization that demonstrates conclusively that his UB is useful" as an argument, but I find the negative version of this a lot less convincing. It's hard to come up with good evidence for the absence of a usecase.)

It seems to me to be cheap and easy to do this (have this UB), since there really isn't any reason to allow such references to exist and the extra complexity incurred on the spec is minimal. It's also easy to take back the UB and make it a mere safety invariant, if this ever causes a problem somewhere. It's impossible to go in the other direction, if we discover in the future that we really want this optimization.

saethlin · 2023-06-10T18:32:39Z

I don't mean to argue that "this optimization is useless and will also never be useful in the future". But I do think that if we are going to claim that some optimization is valuable currently, we should have data to back up that claim. I thought that claim was being made up-thread (by you someone at an all-hands you are echoing).

I'll see if I can take a look at Fuchsia later, if I can figure out how to interact with it.

RalfJung · 2023-06-10T21:13:16Z

Sorry, seems like I was misinterpreting your intent. It's also possible I am misrepresenting what they said, it's been a few years. :/

RalfJung · 2023-10-25T15:37:54Z

One other reason it might make sense to consider &! and &Void validity-uninhabited is coherence with match rules. I assume with x: &Void we want to accept match *x {}, and in fact we do accept it today. However, if validity for references allows &Void to exist, then there exists code where match *x { _ => () } is actually well-defined (i.e., the arm is reachable), and yet removing the match arm gets accepted! That seems somewhat incoherent.

Arguably, fixing this by making more code UB is a bad fix, but the alternative of rejecting the empty match does not seem appealing either.

digama0 · 2023-10-25T16:43:48Z

However, if validity for references allows &Void to exist, then there exists code where match *x { _ => () } is actually well-defined (i.e., the arm is reachable), and yet removing the match arm gets accepted! That seems somewhat incoherent.

Just as a single data point, this behavior doesn't seem that weird to me. I am used to pattern matching that goes looking for exhaustive matches, and using _ means that it doesn't do that. I would rather optimize for the case of safe code here, because there is nothing about this example that involves unsafe code and x: &Void is already outside its safety invariant. This example is no more concerning to me than the fact that writing let foo = *x; is UB if x: &u32 is a dangling pointer (substitute &&u32, &Vec<u32> or similar depending on where we land on shallow validity). You shouldn't allow these time bombs in safe code in the first place.

ijackson · 2024-02-13T18:37:05Z

Currently, with nightly, this program:

#![feature(never_type)]
use std::mem::transmute;

fn main() {
    let x = &();
    let x: &! = unsafe { transmute(x) };
    match *x { _ => {} }
}

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=8cfc2b548c44d8f0b00b1dca258f7e43

runs, but complains about an unreachable match arm.

If you remove the supposedy unreachable arm,

#![feature(never_type)]
use std::mem::transmute;

fn main() {
    let x = &();
    let x: &! = unsafe { transmute(x) };
    match *x { }
}

it dies with SIGILL.

I think this is rather strange, unless both programs have UB.

ijackson · 2024-02-13T18:47:43Z

As a possible argument in favour of &! being UB to construct:

In derive-adhoc I have a large enum whose variants contain "markers" that are sometimes units and sometimes uninhabited: See https://gitlab.torproject.org/Diziet/rust-derive-adhoc/-/blob/main/macros/framework.rs?ref_type=heads#L115 and https://gitlab.torproject.org/Diziet/rust-derive-adhoc/-/blob/main/macros/syntax.rs?ref_type=heads#L67

This allows the reuse of the same enum (source code) for similar value sets that occur in different contexts. Call sites that receive one of these values can call trait methods like this: https://gitlab.torproject.org/Diziet/rust-derive-adhoc/-/blob/main/macros/paste.rs?ref_type=heads#L816

This allows me as the author to statically demonstrate that the situation is impossible, and therefore I don't have to write code to handle it.

It would be nice if the compiler could remove as much of this as possible. Ideally, the uninhabited enum variants, and all the code that handles uninhabited values, would completely vanish as soon as possible after monomorphisation.

Nadrieril · 2024-02-13T19:46:42Z

Your example is an instance of rust-lang/rust#117288, which the lang team has agreed was something they wanted to fix (the arm is deemed unreachable because the *x expression is incorrectly believed to diverge).

ijackson · 2024-02-13T21:14:32Z

Here's another example where disallowing &! improves things:

enum MockableRef<'m> {
    Production,
    Mocked(&'m Mocks),
}

#[cfg(not(test))]
enum Mocks {}

We hope MockableRef is a unit in production code, since only one of its variants ought ever to exist. But it's not: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=cb3a9289904acb3112d76b7a9284ea9f

ijackson · 2024-02-13T21:19:39Z

(Maybe bare &! could be ok, if we could say that the constructor MockableRef::Mocked counts as a function and therefore passing it &! is UB and therefore ::Mocked cannot exist.)

RalfJung · 2024-02-13T21:41:12Z

runs, but complains about an unreachable match arm.

Yeah that should be fixed... I thought @Nadrieril fixed this recently but I guess that was another match lint issue that got fixed?

(Maybe bare &! could be ok, if we could say that the constructor MockableRef::Mocked counts as a function and therefore passing it &! is UB and therefore ::Mocked cannot exist.)

I wouldn't want to make function calls special this way. I'd prefer making &! validity-uninhabited. (It obviously is safety-uninhabited so the validity invariant is the only open question.)

Nadrieril · 2024-02-13T23:48:28Z

Yeah what I fixed is the exhaustiveness code; that example triggers a warning because of divergence tracking instead (i.e. it triggers the unreachable_code lint, and I only fixed the unreachable_pattern lint)

Nadrieril · 2024-10-08T15:33:09Z

Do you want full recursive inhabitedness? E.g. should &&! and &(T, &!) be uninhabited too?

RalfJung · 2024-10-09T16:55:49Z

No worries, and thanks for opening that issue with a very clear motivating use-case. :)

RalfJung · 2024-10-18T16:19:56Z

Do you want full recursive inhabitedness? E.g. should &&! and &(T, &!) be uninhabited too?

I realized I missed this in the sequence of off-topic posts about match.

Yes, this is recursive. If we say that references to uninhabited types are invalid, that means &T is uninhabited if T is uninhabited. Then, by definition &! is an uninhabited type, which makes &&! a reference to an uninhabited type, and thus itself uninhabited. Tuples are uninhabited if any field is uninhabited, so the logic also applies to &(T, &!).

None of this requires recursing over a value of &(T, &!), so I do not consider this "recursive validity checking". We are only recursing over the type, an entirely static concept -- this is just part of computing the layout of the type. Types have a size, alignment, and inhabited: bool flag.

ijackson · 2024-11-12T14:10:05Z

I'm not trusting our ability to come up with real-world benchmarks and predict future coding patterns enough to conclusively say "this optimization is useless and will also never be useful in the future".

I agree with this. However, in case it's helpful anyway:

I have a crate which makes extensive use of uninhabited types for conditional exclusion of unneeded generic code, especially enum variants. derive-deftly-macros. As it's a proc-macro crate, having it compile quickly would be nice. Some deeplinks: Comment explaining the inhabitedness of trait associated types approach; the biggest enum with "filtered" variants; example of how this statically enforces a correctness property; statically unreachable match arms via known-uninhabited static type and via trait method.

I'm not familiar with the compiler, and generally not much of a compiler engineer, so IDK if this would make a good test case for any of these optimisations. But I thought I should mention it.

This was referenced Jun 6, 2023

Validity of references: Bit-related properties #76

Closed

Do we have full recursive validity for references? #412

Open

saethlin mentioned this issue Jun 7, 2023

Make function bodies with &! arguments unreachable rust-lang/rust#112372

Closed

RalfJung mentioned this issue Aug 2, 2023

Validity of Box<T> #145

Open

RalfJung mentioned this issue Oct 25, 2023

Tracking issue for RFC 1872: exhaustive_patterns feature rust-lang/rust#51085

Open

3 tasks

RalfJung mentioned this issue Aug 18, 2024

box syntax creates Box<T> where T is not yet initialized rust-lang/rust#97270

Open

jonhoo mentioned this issue Aug 28, 2024

Non-exhaustive patterns error when matching reference to empty enum rust-lang/rust#78123

Closed

This comment has been minimized.

Sign in to view

dhedey mentioned this issue Oct 9, 2024

Compilation error when matching reference to empty enum rust-lang/rust#131452

Open

This comment has been minimized.

Sign in to view

RalfJung mentioned this issue Oct 21, 2024

Layout and behavior of enums with uninhabited fields in some variants #443

Open

Nadrieril mentioned this issue Nov 12, 2024

RFC: Never patterns rust-lang/rfcs#3719

Open

Can references to uninhabited types ever be valid? #413

Can references to uninhabited types ever be valid? #413

Comments

RalfJung commented Jun 6, 2023 • edited Loading

digama0 commented Jun 6, 2023 • edited Loading

RalfJung commented Jun 6, 2023

chorman0773 commented Jun 6, 2023

RalfJung commented Jun 6, 2023 • edited Loading

digama0 commented Jun 6, 2023 • edited Loading

digama0 commented Jun 6, 2023

Diggsey commented Jun 6, 2023

saethlin commented Jun 6, 2023

digama0 commented Jun 6, 2023

RalfJung commented Jun 7, 2023

saethlin commented Jun 7, 2023

RalfJung commented Jun 8, 2023

RalfJung commented Jun 8, 2023

digama0 commented Jun 8, 2023

RalfJung commented Jun 8, 2023

chorman0773 commented Jun 9, 2023

RalfJung commented Jun 9, 2023 via email

saethlin commented Jun 9, 2023

RalfJung commented Jun 10, 2023 • edited Loading

saethlin commented Jun 10, 2023

RalfJung commented Jun 10, 2023 via email • edited Loading

RalfJung commented Oct 25, 2023

digama0 commented Oct 25, 2023 • edited Loading

ijackson commented Feb 13, 2024

ijackson commented Feb 13, 2024

Nadrieril commented Feb 13, 2024

ijackson commented Feb 13, 2024

ijackson commented Feb 13, 2024

RalfJung commented Feb 13, 2024

Nadrieril commented Feb 13, 2024 • edited Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Nadrieril commented Oct 8, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

RalfJung commented Oct 9, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

RalfJung commented Oct 18, 2024 • edited Loading

ijackson commented Nov 12, 2024 • edited Loading

RalfJung commented Jun 6, 2023 •

edited

Loading

digama0 commented Jun 6, 2023 •

edited

Loading

RalfJung commented Jun 6, 2023 •

edited

Loading

digama0 commented Jun 6, 2023 •

edited

Loading

RalfJung commented Jun 10, 2023 •

edited

Loading

RalfJung commented Jun 10, 2023 via email •

edited

Loading

digama0 commented Oct 25, 2023 •

edited

Loading

Nadrieril commented Feb 13, 2024 •

edited

Loading

RalfJung commented Oct 18, 2024 •

edited

Loading

ijackson commented Nov 12, 2024 •

edited

Loading