Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can references to uninhabited types ever be valid? #413

Open
RalfJung opened this issue Jun 6, 2023 · 65 comments
Open

Can references to uninhabited types ever be valid? #413

RalfJung opened this issue Jun 6, 2023 · 65 comments

Comments

@RalfJung
Copy link
Member

RalfJung commented Jun 6, 2023

This discussion presupposes that "Do we have full recursive validity for references" (#412) is answered "no". This issue is only about references to uninhabited types; references to any other type are not relevant here.

So, in general we allow references to point to invalid data. Is the following UB?

let x: &! = transmute(&());

Since there is no recursive validity for references, the answer might be "no". However, the reason we could consider making this UB is that all of the arguments in #412 do not apply: to determine whether an &! is valid, we do not have to check memory, do any potentially racy reads, or anything like that. We can just answer the question with "no". Recursive validity is tricky in part because references can "bounce" between being valid and invalid as the contents of memory change; but for references to uninhabited types, this concern does not apply.

Basically, we can think of ! having an impossible-to-satisfy alignment requirement, and therefore &! is invalid since it is never properly aligned. (I don't really want to fold this into the alignment check, but the point is that we already have the validity of &T depend on T, namely via alignment -- I see no fundamental reason why we couldn't do the same with inhabitedness.)

@digama0
Copy link

digama0 commented Jun 6, 2023

Does this issue also cover &Void and &(!, T)? The latter is problematic since I think there are situations where you can have partially uninit references (and this also partially overlaps with #346). But maybe it suffices to say that an initialized &T (i.e. a value returned from a function, like your transmute example) requires that T be inhabited, even though uninitialized or partially initialized places of type &! or &(!, T) can exist.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 6, 2023

Does this issue also cover &Void and &(!, T)?

Yes.

While you can have partially initialized locals of type (!, T), I don't think you can have references to them. Do you have an example for that?

@chorman0773
Copy link
Contributor

Is there a reason why they should be invalid unless &T is invalid if the T is?

@RalfJung
Copy link
Member Author

RalfJung commented Jun 6, 2023

One motivation is being able to replace the entire body of an &self function where self is uninhabited with just unreachable_unchecked(). Back at the last Rust All hands (2019 I think?), someone expressed strong interest in that (code size) optimization.

@digama0
Copy link

digama0 commented Jun 6, 2023

One motivation is being able to replace the entire body of an &self function where self is uninhabited with just unreachable_unchecked(). Back at the last Rust All hands (2019 I think?), someone expressed strong interest in that (code size) optimization.

This has been mentioned elsewhere and I think you agreed to it, but that is not a very compelling motivation since you can just put match *self {} in the body to get the same optimization. (If I had my druthers match self {} would also work; this does not strictly speaking require a "no" answer to the OP question.)

@digama0
Copy link

digama0 commented Jun 6, 2023

While you can have partially initialized locals of type (!, T), I don't think you can have references to them. Do you have an example for that?

I don't, at least not without writing an unsafe block that would be UB under the OP question (more or less by definition). A variation that may or may not count is &Box<!>; I'm not sure whether this counts as inhabited or not, and it might actually thread the needle between competing notions of inhabitedness since you can empty-match on it and move out a ! (although maybe you want to call this a safety violation), even though by structural analysis of Box<!> you get a Unique which could just be a dangling pointer.

My main motivation for considering that case specially is that the borrow checker does allow for types like &(T, U) to have more complex initialization states, and future lang proposals might reify these as types (e.g. view types), in which case you might have a function that takes an argument of type &{only T is init} (!, T) and you would not be able to assume that this type is uninhabited.

@Diggsey
Copy link

Diggsey commented Jun 6, 2023

One motivation is being able to replace the entire body of an &self function where self is uninhabited with just unreachable_unchecked(). Back at the last Rust All hands (2019 I think?), someone expressed strong interest in that (code size) optimization.

Wouldn't it be almost always trivial to prove that such a function is unreachable anyway? Assuming reachability checks occur post-monomorphisation.

@saethlin
Copy link
Member

saethlin commented Jun 6, 2023

One motivation is being able to replace the entire body of an &self function where self is uninhabited with just unreachable_unchecked(). Back at the last Rust All hands (2019 I think?), someone expressed strong interest in that (code size) optimization.

I feel like we could do an experiment here to figure out if this is actually worth anything?

@digama0
Copy link

digama0 commented Jun 6, 2023

If it's unreachable, then you wouldn't even need an unreachable_unchecked, the whole function could be deleted. But it could be a field in a trait object, for example, in which case you do need the function (but it could be a dangling pointer? 🤔 )

@RalfJung
Copy link
Member Author

RalfJung commented Jun 7, 2023

This has been mentioned elsewhere and I think you agreed to it, but that is not a very compelling motivation since you can just put match *self {} in the body to get the same optimization. (If I had my druthers match self {} would also work; this does not strictly speaking require a "no" answer to the OP question.)

I'm trying to remember the old discussion. The concern might have been that these are trait impls generated automatically / via proc macros for tons of types. So we can't just add custom code into the body.

I think it was @cramertj who raised this wish back then, not sure if they still remember any details though.

Wouldn't it be almost always trivial to prove that such a function is unreachable anyway? Assuming reachability checks occur post-monomorphisation.

If it's a public function in a lib crate, how would that be trivial?

@saethlin
Copy link
Member

saethlin commented Jun 7, 2023

The perf run for this optimization implemented as a MIR pass indicates no benefit in terms of code size: https://perf.rust-lang.org/compare.html?start=afab3662eb066f05fcdb43c421b72dd19472e752&end=0f4641ea16e650c77a03aa649330702722a5f66f&stat=size%3Alinked_artifact

@RalfJung
Copy link
Member Author

RalfJung commented Jun 8, 2023

I guess the benchmarks don't contain auto-derived code for uninhabited types. Also as a MIR opt this is less effective than it could be, if generic types and their impls are monomorphized for an uninhabited type.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 8, 2023

My main motivation for considering that case specially is that the borrow checker does allow for types like &(T, U) to have more complex initialization states

I have never seen this happen, so this is news to me. What is an example of that?

@digama0
Copy link

digama0 commented Jun 8, 2023

I take it back, the borrow checker will reason about partially initialized Box<(T, U)> but not &(T, U). However I can come up with examples that would plausibly benefit from it:

#[derive(Default)]
struct Foo {
  a: String,
  ...
  z: String,
}
let mut x = Foo::default();
drop(x.z);
|| println!("{}, ..., {}", x.a, ..., x.y);

Currently, the returned closure captures &x.a, ..., &x.y as references and hence would be 25 pointers large, but a future optimization could capture &x instead with a borrow state indicating that x.z is uninit.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 8, 2023

the borrow checker will reason about partially initialized Box<(T, U)> but not &(T, U).

Ah, yes that makes sense -- Box<T> is treated like T in many regards. I don't see us doing that for references though since it is valid in much fewer cases. And if we ever want to do that, we can just take back the UB on &Uninhabited.

Currently, the returned closure captures &x.a, ..., &x.y as references and hence would be 25 pointers large, but a future optimization could capture &x instead with a borrow state indicating that x.z is uninit.

With per-field closure capture, don't we actually do capture just a single reference and use it for all fields? That was considered as an implementation strategy at some point.

But anyway if we do that, we just have to pick a suitable type for the closure capture environment, one that is not uninhabited. Arguably actually using &(!, i32) for that capture would be a blatant lie and could confuse various parts of the compiler.

@chorman0773
Copy link
Contributor

The concern might have been that these are trait impls generated automatically / via proc macros for tons of types.

I'd expect that a lot of those trait impls will ultimately contain match *self{} as the first item.

use serde::Serialize;
#[derive(Serialize)]
pub enum Void{}

expands to

pub enum Void {}
#[doc(hidden)]
#[allow(non_upper_case_globals, unused_attributes, unused_qualifications)]
const _: () =
    {
        #[allow(unused_extern_crates, clippy :: useless_attribute)]
        extern crate serde as _serde;
        #[allow(unused_macros)]
        macro_rules! try {
            ($__expr : expr) =>
            {
                match $__expr
                {
                    _serde :: __private :: Ok(__val) => __val, _serde ::
                    __private :: Err(__err) =>
                    { return _serde :: __private :: Err(__err) ; }
                }
            }
        }
        #[automatically_derived]
        impl _serde::Serialize for Void {
            fn serialize<__S>(&self, __serializer: __S)
                -> _serde::__private::Result<__S::Ok, __S::Error> where
                __S: _serde::Serializer {
                match *self {}
            }
        }
    };

(I tried with builtin derives, and it just generates core::intrinsics::unreachable(), so apparently we can insert custom code into them in particular.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 9, 2023 via email

@saethlin
Copy link
Member

saethlin commented Jun 9, 2023

I guess the benchmarks don't contain auto-derived code for uninhabited types.

No, they do.

Also as a MIR opt this is less effective than it could be, if generic types and their impls are monomorphized for an uninhabited type.

I moved the transform to codegen where we have monomoprhized MIR. It still has no effect, and in a stage2 build of rustc it only fires one additional time.
https://perf.rust-lang.org/compare.html?start=8b35c0bb0f833c0077dc57006eb317edde2a2d1e&end=b7380c57d311609c0fd4f128fbe8adbbe0f96a24&stat=size%3Alinked_artifact

Many of the times this optimization fires, it transforms this:

bb0 {
    _0 = const _;
    return;
}

into

bb0 {
    unreachable;
}

Hardly anything to celebrate.

What we don't have is a highly artificial crate that just derives a bunch of traits on a bunch of types which are all uninhabited. Maybe then this would show up. But that form of argument can justify any optimization so I don't think it is worth considering.

I do not think an argument about the value of this optimization holds up. But the try toolchain(s) are up there, so if someone has a codebase where they can demonstrate it helping, please do so.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 10, 2023

Given a large tuple type with an uninhabited field, we currently do generate a whole bunch of MIR / LLVM IR that is all completely pointless, right? I was told that in the Fuchsia codebase, being able to optimize this is relevant. That's all I know, and it seems plausible. You are saying in our benchmarks we have no such types. Fair, but that doesn't really say much about the code in the wild.

I'm not trusting our ability to come up with real-world benchmarks and predict future coding patterns enough to conclusively say "this optimization is useless and will also never be useful in the future". (I will also note that this kind of argument hasn't been used in other UB discussions before, despite other forms of UB being a lot more subtle, having non-local effects, and having no real-world evidence of being helpful either... are we going to demand real-world optimization evidence for every single bit of UB, and make everything else defined? I'm fine with "here is an optimization that demonstrates conclusively that his UB is useful" as an argument, but I find the negative version of this a lot less convincing. It's hard to come up with good evidence for the absence of a usecase.)

It seems to me to be cheap and easy to do this (have this UB), since there really isn't any reason to allow such references to exist and the extra complexity incurred on the spec is minimal. It's also easy to take back the UB and make it a mere safety invariant, if this ever causes a problem somewhere. It's impossible to go in the other direction, if we discover in the future that we really want this optimization.

@saethlin
Copy link
Member

I don't mean to argue that "this optimization is useless and will also never be useful in the future". But I do think that if we are going to claim that some optimization is valuable currently, we should have data to back up that claim. I thought that claim was being made up-thread (by you someone at an all-hands you are echoing).

I'll see if I can take a look at Fuchsia later, if I can figure out how to interact with it.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 10, 2023 via email

@RalfJung
Copy link
Member Author

One other reason it might make sense to consider &! and &Void validity-uninhabited is coherence with match rules. I assume with x: &Void we want to accept match *x {}, and in fact we do accept it today. However, if validity for references allows &Void to exist, then there exists code where match *x { _ => () } is actually well-defined (i.e., the arm is reachable), and yet removing the match arm gets accepted! That seems somewhat incoherent.

Arguably, fixing this by making more code UB is a bad fix, but the alternative of rejecting the empty match does not seem appealing either.

@digama0
Copy link

digama0 commented Oct 25, 2023

However, if validity for references allows &Void to exist, then there exists code where match *x { _ => () } is actually well-defined (i.e., the arm is reachable), and yet removing the match arm gets accepted! That seems somewhat incoherent.

Just as a single data point, this behavior doesn't seem that weird to me. I am used to pattern matching that goes looking for exhaustive matches, and using _ means that it doesn't do that. I would rather optimize for the case of safe code here, because there is nothing about this example that involves unsafe code and x: &Void is already outside its safety invariant. This example is no more concerning to me than the fact that writing let foo = *x; is UB if x: &u32 is a dangling pointer (substitute &&u32, &Vec<u32> or similar depending on where we land on shallow validity). You shouldn't allow these time bombs in safe code in the first place.

@ijackson
Copy link

Currently, with nightly, this program:

#![feature(never_type)]
use std::mem::transmute;

fn main() {
    let x = &();
    let x: &! = unsafe { transmute(x) };
    match *x { _ => {} }
}

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=8cfc2b548c44d8f0b00b1dca258f7e43

runs, but complains about an unreachable match arm.

If you remove the supposedy unreachable arm,

#![feature(never_type)]
use std::mem::transmute;

fn main() {
    let x = &();
    let x: &! = unsafe { transmute(x) };
    match *x { }
}

it dies with SIGILL.

I think this is rather strange, unless both programs have UB.

@ijackson
Copy link

As a possible argument in favour of &! being UB to construct:

In derive-adhoc I have a large enum whose variants contain "markers" that are sometimes units and sometimes uninhabited: See https://gitlab.torproject.org/Diziet/rust-derive-adhoc/-/blob/main/macros/framework.rs?ref_type=heads#L115 and https://gitlab.torproject.org/Diziet/rust-derive-adhoc/-/blob/main/macros/syntax.rs?ref_type=heads#L67

This allows the reuse of the same enum (source code) for similar value sets that occur in different contexts. Call sites that receive one of these values can call trait methods like this: https://gitlab.torproject.org/Diziet/rust-derive-adhoc/-/blob/main/macros/paste.rs?ref_type=heads#L816

This allows me as the author to statically demonstrate that the situation is impossible, and therefore I don't have to write code to handle it.

It would be nice if the compiler could remove as much of this as possible. Ideally, the uninhabited enum variants, and all the code that handles uninhabited values, would completely vanish as soon as possible after monomorphisation.

@Nadrieril
Copy link
Member

Your example is an instance of rust-lang/rust#117288, which the lang team has agreed was something they wanted to fix (the arm is deemed unreachable because the *x expression is incorrectly believed to diverge).

@ijackson
Copy link

Here's another example where disallowing &! improves things:

enum MockableRef<'m> {
    Production,
    Mocked(&'m Mocks),
}

#[cfg(not(test))]
enum Mocks {}

We hope MockableRef is a unit in production code, since only one of its variants ought ever to exist. But it's not: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=cb3a9289904acb3112d76b7a9284ea9f

@ijackson
Copy link

(Maybe bare &! could be ok, if we could say that the constructor MockableRef::Mocked counts as a function and therefore passing it &! is UB and therefore ::Mocked cannot exist.)

@RalfJung
Copy link
Member Author

runs, but complains about an unreachable match arm.

Yeah that should be fixed... I thought @Nadrieril fixed this recently but I guess that was another match lint issue that got fixed?

(Maybe bare &! could be ok, if we could say that the constructor MockableRef::Mocked counts as a function and therefore passing it &! is UB and therefore ::Mocked cannot exist.)

I wouldn't want to make function calls special this way. I'd prefer making &! validity-uninhabited. (It obviously is safety-uninhabited so the validity invariant is the only open question.)

@Nadrieril
Copy link
Member

Nadrieril commented Feb 13, 2024

Yeah what I fixed is the exhaustiveness code; that example triggers a warning because of divergence tracking instead (i.e. it triggers the unreachable_code lint, and I only fixed the unreachable_pattern lint)

@dhedey

This comment has been minimized.

@dhedey

This comment has been minimized.

@chorman0773

This comment has been minimized.

@dhedey

This comment has been minimized.

@Diggsey

This comment has been minimized.

@dhedey

This comment has been minimized.

@chorman0773

This comment has been minimized.

@Nadrieril
Copy link
Member

Do you want full recursive inhabitedness? E.g. should &&! and &(T, &!) be uninhabited too?

@RalfJung

This comment has been minimized.

@RalfJung

This comment has been minimized.

@dhedey

This comment has been minimized.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 9, 2024

No worries, and thanks for opening that issue with a very clear motivating use-case. :)

@SUPERCILEX

This comment has been minimized.

@RalfJung

This comment has been minimized.

@SUPERCILEX

This comment has been minimized.

@RalfJung

This comment has been minimized.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 18, 2024

Do you want full recursive inhabitedness? E.g. should &&! and &(T, &!) be uninhabited too?

I realized I missed this in the sequence of off-topic posts about match.

Yes, this is recursive. If we say that references to uninhabited types are invalid, that means &T is uninhabited if T is uninhabited. Then, by definition &! is an uninhabited type, which makes &&! a reference to an uninhabited type, and thus itself uninhabited. Tuples are uninhabited if any field is uninhabited, so the logic also applies to &(T, &!).

None of this requires recursing over a value of &(T, &!), so I do not consider this "recursive validity checking". We are only recursing over the type, an entirely static concept -- this is just part of computing the layout of the type. Types have a size, alignment, and inhabited: bool flag.

@ijackson
Copy link

ijackson commented Nov 12, 2024

I'm not trusting our ability to come up with real-world benchmarks and predict future coding patterns enough to conclusively say "this optimization is useless and will also never be useful in the future".

I agree with this. However, in case it's helpful anyway:

I have a crate which makes extensive use of uninhabited types for conditional exclusion of unneeded generic code, especially enum variants. derive-deftly-macros. As it's a proc-macro crate, having it compile quickly would be nice. Some deeplinks: Comment explaining the inhabitedness of trait associated types approach; the biggest enum with "filtered" variants; example of how this statically enforces a correctness property; statically unreachable match arms via known-uninhabited static type and via trait method.

I'm not familiar with the compiler, and generally not much of a compiler engineer, so IDK if this would make a good test case for any of these optimisations. But I thought I should mention it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests