Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider creating a safe version of this library #10

Closed
nakedible opened this issue Oct 14, 2023 · 4 comments
Closed

Consider creating a safe version of this library #10

nakedible opened this issue Oct 14, 2023 · 4 comments

Comments

@nakedible
Copy link

Could you possibly consider making a safe version of this library which would not use any unsafe constructs internally?

It's obvious that some compiler optimizations will be lost and unneeded range checks will be added to some places. But in some cases it is also possible to prove that the value is in the range to the compiler, while still not having explicit panics generated. And in many cases compiler optimizations while inlining will remove any superfluous range checks.

In theory, it could even be a cargo feature to enable no_internal_unsafe while keeping the same API.

To be clear, I'm fine with having unsafe functions in the API as calling code needs to be also marked unsafe - but implicitly using unsafe blocks inside functions requires trusting the correct implementation of all of those functions, in this version and in any future versions.

@jhpratt
Copy link
Owner

jhpratt commented Oct 14, 2023

What advantage would this have over the status quo? Every use of unsafe is documented, and this is enforced by clippy. Nearly every situation has a debug assertion for the value as well.

Any option that is present would necessarily duplicate nearly the entire crate, as unsafe is prevalent. Most of the time I would not expect the compiler to be able to optimize away the checks, as any non-static value is unknowable to the compiler.

@nakedible
Copy link
Author

This is probably a philosophical argument where we just agree to disagree - but I'll answer shortly anyway.

Rust attempts to bring safety to programming, so undefined behaviour would be constrained. There are obviously still bugs in the Rust runtime which cause undefined behaviour, but those are rare and getting even more rare. Any library that uses unsafe obviously can cause undefined behaviour, which means that the users of that library have to trust the library author to not have made any mistakes in their code. I mean, maybe the author is a Rust dev, so you have to trust them anyway in the core Rust codebase - but there is usually a bit more eyeballs there. In a library, there's less eyeballs, and at any point the library author may make a "performance optimization" to some detail inside unsafe and suddenly the library that used to be safe causes undefined behaviour in a special case. And especially if it's the case of compiler optimizations, debugging those might be insanely hard. To combat this problem there's a ton of tools for Rust like cargo geiger, cackle, etc. That's why a number of people (https://github.com/rust-secure-code/safety-dance) are reducing the use of unsafe around the different Rust libraries and sometimes even managing to fix vulnerabilities in the process. But, I think already know all this, just have a different opinion on it.

I don't consider deranged so much a performance optimization crate than a usability and safety optimization crate by having ranged integers as types. There are of course alternatives but honestly this looks the nicest so far. I'd use this crate even if there's a performance hit for using it, because the performance hit is likely to be minimal. It's the same reason Rust has array bounds checking enabled, even though there is a performance hit.

On the code itself, I don't think the duplication is as big as you think it is, having already looked at all the unsafe usages. Especially with a couple helper methods, I believe the duplication could be just a few lines of code. I can even make the pull myself if you want – but I have no wish to force the issue if you don't think it is worthwhile.

And on the compiler optimizations, yes, any non-static value is unknowable to the compiler, but that doesn't mean the compiler can't reason about it. For example if you do (x as u32) % 20, the compiler knows the range is 0-19. That's why many array bounds checks can often be eliminated as the compiler is able to reason that they are not necessary if you write the code in a suitable way. This obviously doesn't work if the value lifetime is not local.

In any case, totally up to you – if you don't feel this is worthwhile especially on a proof-of-concept implementation such as this, then somebody else will fill that niche.

@jhpratt
Copy link
Owner

jhpratt commented Oct 16, 2023

I suspected you were going for a philosophical view of things going in, but I wasn't positive.

maybe the author is a Rust dev

At the risk of an argument from authority, I am. By even using the Rust compiler and standard library, you're using code that I've worked on.

suddenly the library that used to be safe causes undefined behaviour in a special case

By "safe", I presume you mean "sound". This is where debug assertions come in, which are widespread as I mentioned. Without looking over the full crate right now, I can't think of a case where there isn't a debug assertion present for something that could result in UB.

I don't consider deranged so much a performance optimization crate than a usability and safety optimization crate by having ranged integers as types.

It's both. I spun this out of my work on the time crate, where it is now used internally. If others find it useful, great! But so far, time is literally the only crate that uses it last I checked.

because the performance hit is likely to be minimal

For example if you do (x as u32) % 20, the compiler knows the range is 0-19

Perhaps within deranged itself the impact will be minimal, but there are effects on the crates that use deranged as well. The compiler is able to propagate the unreachable annotations which results in further optimizations in certain situations. Trivial cases aren't the best examples for optimization, as it's the more complex situations that truly benefit.

It's the same reason Rust has array bounds checking enabled, even though there is a performance hit.

Array indexing uses unsafe internally. The bounds checking is only to return Some/None, the same as any method that returns Option in deranged. There are unchecked methods in the same vein as .get_unchecked on arrays.

I don't think the duplication is as big as you think it is

Happy to be proven wrong! Every time a ranged integer is constructed, unsafe is involved, so I remain skeptical.

In any case, totally up to you – if you don't feel this is worthwhile especially on a proof-of-concept implementation such as this, then somebody else will fill that niche.

I'm not going to promise a PR gets merged, of course, but if it is a small diff and would require minimal effort to maintain, including for future code, then I am open to the idea. My ultimate goal is to get something like this as part of Rust itself, where unsafe would undoubtedly be used in a similar manner as it is today.

@nakedible
Copy link
Author

It's probably pointless for me to reiterate my position (and I don't mean that in a bad way!) as you know all my arguments better than I know them myself. However, I will try to summarize as clearly as I can:

  • I don't trust you to not ever make mistakes, so any use of unsafe is suspect to cause unsound behaviour
  • Use of unsafe inside the standard library gets much more review, so I treat that different from independently published libraries
  • I accept that there are many reasons why many libraries cannot avoid internal unsafe code (like implementing new data structures not supported by the borrow checker), which is a risk that has to be accepted while using those libraries
  • I dislike it when libraries use unsafe for micro-optimizations that might not even result in real world benefits, but still create a potential cause of unsoundness

If this library is meant to be included in the Rust standard library itself, I have no complaints about the use of unsafe. But if this library is meant to be used by other projects directly, I'd hope to reduce the amount of unsafe.


I wrote a pull request that basically changes most usages of new_unchecked in safe code to be new_saturating, because the compiler is smart enough to remove all the range checks anyway: #11

The pull isn't polished yet, as I don't want to work on something that will not get merged, so I wanted your opinion on it at this point.

As a follow up after that pull, there are exactly four remaining usages of "internal" unsafe (safe functions which contain an unsafe block):

  • get()
  • get_ref()
  • Option get()
  • wrapping_add()

To make those usages safe, the idea would be to add a feature flag no-internal-unsafe, which would replace those with versions that might contain two useless comparisons (that the compiler will elide if it can statically them know to be untrue) but they would not use unsafe at all.

Possibly wrapping_add can even be modified to produce as fast or faster assembly, while formulating it such that the compiler is able to eliminate the bounds checks from new_saturating, making it fully safe without having to have two versions of it. But no promises, I don't even have an idea for this yet.

@jhpratt jhpratt closed this as not planned Won't fix, can't repro, duplicate, stale Feb 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants