Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we have full recursive validity for references? #412

Open
RalfJung opened this issue Jun 6, 2023 · 10 comments
Open

Do we have full recursive validity for references? #412

RalfJung opened this issue Jun 6, 2023 · 10 comments

Comments

@RalfJung
Copy link
Member

RalfJung commented Jun 6, 2023

This is one of the successors to #77. The question is: for a reference to be valid, do we require that all the data it points to is equally valid, fully recursively?

I think the answer is "no", and my impression is most people agree, so let me collect some arguments here (and I will open some other issues that only make sense discussing after the answer here is "no"):

  • Full recursive validity is super fragile: to reason about why the reference you are creating does not cause UB, you have to reason about all code that might mutate any memory transitively reachable (via references).
  • The benefit is questionable: having deeply nested references cause UB is very unlikely to ever benefit compiler optimizations.
  • Allowing mutable references to uninit data (between consenting APIs) is (unfortunately) how the io::Read trait often works, and while better solutions are being developed, there's a long tail of copies of this API and a lot of old code to port, that we should have good reasons to consider UB.
  • For shared references with interior mutability specifically, even determining whether an &Mutex<bool> is valid would cause a conceptual data race with another thread holding the lock and mutating that bool.
  • Some of the arguments in Document current justification for not requiring recursive reference validity (in particular, &mut uninit not being immediate UB) #346 also apply, though some are more specific and a lot of the discussion focuses not on "full recursive validity: yes or know" but various weaker validity variants.
  • Also see this post

Note that the Rust reference currently answers this question with "yes", but in my view this is mostly because we haven't yet figured out what exactly the weaker requirement is that we actually want to impose.

@RalfJung RalfJung changed the title Do we have full recursive validity for references Do we have full recursive validity for references? Jun 6, 2023
@digama0
Copy link

digama0 commented Jun 6, 2023

I think point 4 means that "full recursive validity" is not actually well-defined. A reference does not necessarily give access to the full pointed-to allocation, if parts of that allocation are currently loaned out, and I think one aspect of the memory model that we should all be able to agree on is that a evaluating a validity assertion should not require "reads" which are not legal by the memory model itself, because this leads to data-race-like issues which I'm not sure can be resolved in the presence of weak memory.

However, not all validity properties are incompatible with concurrent reads, because not everything is under an UnsafeCell. I would like us to separate the extremes of "full recursive validity" and "validity of references cannot depend on memory". I would say that "full recursive validity" is not a coherent position as currently understood, but there is still a possibility for a middle ground where some memory is involved in the validity of references, possibly even multiple levels of pointer indirection, as long as we do not need to look at memory which is currently loaned out or memory behind an UnsafeCell.

TL;DR: The answer to the title question should be "no", but the follow up questions might not be so simple.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 6, 2023

TL;DR: The answer to the title question should be "no", but the follow up questions might not be so simple.

Yes. The idea was to have this issue for documenting that consensus, so that we can have separate issues for the follow-up questions. That is my attempt to disentangle the various follow-up questions that arise here.

Let's collect follow-up questions we have issues for:

Please open a new issue if you think there is a separate proposal/discussion to be had.

@celinval
Copy link

celinval commented Jul 4, 2024

Is this issue proposing to change the UB defined in the reference?

A reference or Box that is dangling, misaligned, or points to an invalid value.

@chorman0773
Copy link
Contributor

chorman0773 commented Jul 4, 2024

It would still be an unsafe value (unsound to yield to arbitrary safe code), but yes, it would not be undefined behaviour to produce the reference.

@RalfJung
Copy link
Member Author

RalfJung commented Jul 4, 2024

Yes, many of the issues in this repo are about finding out what UB we really want. The reference list is preliminary, as indicated by the big warning there.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 2, 2024

I added a flag to Miri that lets us experiment with recursive validity checking.

Turns out at least recursive validity of Box is violated all the time, since the MIR we generate for Box::new first creates a fresh initialized box and then stores a T in there -- so we have a Box<T> value pointing to uninitialized memory for a short time. See also rust-lang/rust#97270. So if we want recursive validity of Box, we need to change our MIR building for Box::new.

For non-Box, the first issue I am hitting is this code which blatantly tells BorrowedBuf that there are 8 initialized bytes when that's not actually true, and subsequently calls init_ref, creating a slice to uninit memory.

And then, the first issue that's not just in a test: this code here is unsound with recursive reference validity. *(*arr_ptr).get_unchecked_mut(i) will turn arr_ptr: *mut [&mut T; N] into a reference to pass it to get_unchecked_mut, but the array has not been fully initialized yet.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 4, 2024
…r=saethlin

Miri: add a flag to do recursive validity checking

The point of this flag is to allow gathering experimental data for rust-lang/unsafe-code-guidelines#412.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 4, 2024
…r=saethlin

Miri: add a flag to do recursive validity checking

The point of this flag is to allow gathering experimental data for rust-lang/unsafe-code-guidelines#412.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 4, 2024
…r=saethlin

Miri: add a flag to do recursive validity checking

The point of this flag is to allow gathering experimental data for rust-lang/unsafe-code-guidelines#412.
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Aug 4, 2024
Rollup merge of rust-lang#128531 - RalfJung:miri-recursive-validity, r=saethlin

Miri: add a flag to do recursive validity checking

The point of this flag is to allow gathering experimental data for rust-lang/unsafe-code-guidelines#412.
RalfJung pushed a commit to RalfJung/miri that referenced this issue Aug 5, 2024
Miri: add a flag to do recursive validity checking

The point of this flag is to allow gathering experimental data for rust-lang/unsafe-code-guidelines#412.
@RalfJung
Copy link
Member Author

RalfJung commented Sep 27, 2024

Full recursive validity is also fundamentally at odds with #[may_dangle]. This code is UB under full recursive validity:

#![feature(dropck_eyepatch)]

struct Foo<'a>(&'a u8);

unsafe impl<#[may_dangle] 'a> Drop for Foo<'a> {
    fn drop(&mut self) { }
}

fn main() {
    let foo;
    {
        let x = 0;
        foo = Foo(&x);
    }
}

(Example due to @JakobDegen, from a different discussion )

The self argument in drop will be &mut Newtype(&'a u8) , where 'a is dead and the inner reference truly dangles. So full recursive validity implies that

  • either #[may_dangle] cannot be used on lifetimes/types that "really occur" in the type (i.e., it can only be used like in Vec where the relevant T is behind a raw pointer and inside PhantomData, but not directly safely reachable)
  • or #[may_dangle] affects the validity requirements of the drop function it is attached to -- which is a ticking bomb since any function called by drop does not inherit this special treatment

@workingjubilee
Copy link
Member

The benefit is questionable: having deeply nested references cause UB is very unlikely to ever benefit compiler optimizations.

That part seems speculative. I have absolutely seen code that looks like

let iter = vec.iter().filter(pred);
call();
for item in iter.filter(overlapping_pred) {
    // code
}

in the wild... usually because the first part looks like calling this:

impl SomeType<T> {
    fn iter(&self) -> impl Iterator<Item = &T> {
         self.inner_slice.filter(valid_elements)
    }
}

And wouldn't that benefit from the optimization that's currently invalid in rust-lang/rust#130853 of a simple &&u8?

let val = **x;
call();
**x == val // optimize to true

Obviously this may not hold for all types.

@RalfJung
Copy link
Member Author

Aliasing for nested references and recursive validity do not have to be entangled, so please keep the former to #532.

@workingjubilee
Copy link
Member

workingjubilee commented Oct 11, 2024

squints

Huuuh, I'm surprised by that, but okay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants