Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export cold_path() in std::hint #510

Closed
x17jiri opened this issue Dec 21, 2024 · 17 comments
Closed

Export cold_path() in std::hint #510

x17jiri opened this issue Dec 21, 2024 · 17 comments
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@x17jiri
Copy link

x17jiri commented Dec 21, 2024

Proposal

Problem statement

It is sometimes helpful to let the compiler know what code path is the fast path, so it can be optimized at the expense of the slow path. This proposal suggests that the cold_path() intrinsic is simple and reliable way to provide this information and it could be reexported in std::hint.

grep-ing the LLVM source code for BlockFrequencyInfo and BranchProbabilityInfo shows that this information is used at many places in the optimizer. Such as:

  • block placement - improve locality by making the fast path compact and move everything else out of the way
  • inlining, loop unrolling - these optimizations can be less aggressive on the cold path therefore reducing code size
  • register allocation - preferably keep in registers the data needed on the fast path

Motivating examples or use cases

The cold_path() call can be simply placed on some code path marking it as cold.

    if condition {
        // this is the fast path
    } else {
        cold_path();
        // this path is unlikely
    }
    match a {
        1 => a,
        2 => b,
        3 => { cold_path(); c }, // this branch is unlikely
        _ => { cold_path(); d }, // this is also unlikely
    }
    let buf = Global.allocate(layout).map_err(|_| {
        // the error is unlikely
        cold_path();
        Error::new_alloc_failed("Cannot allocate memory.")
    })?;

Solution sketch

This is already implemented in intrinsics. All we have to do is create a wrapper in std::hint:

    #[inline(always)]
    pub fn cold_path() {
        std::intrinsics::cold_path()
    }

Alternatives

likely/unlikely

These are harder to use for idiomatic Rust. For example they don't work well with match arms. On the other hand, sometimes they can be more readable and may also be worth reexporting in std::hint.

For example, this would be harder to express using cold_path():

    if likely(x) && unlikely(y) {
        true_branch
    }

And this looks better without the extra branch:

    if likely(cond) {
        true_branch
    }

    if cond {
        true_branch
    } else {
        cold_path()
    }

extending the functionality of #[cold] attribute

This attribute could be allowed on match arms or on closures. I'm not sure if it's worth it adding extra syntax if the functionality can be implemented by a library call.

Links and related work

rust-lang/rust#120370 - added cold_path() as part of fixing likely and unlikely

rust-lang/rust#133852 - improvements of the cold_path() implementation

rust-lang/rust#120193 - proposal for #[cold] on match arms

@saethlin
Copy link
Member

I have a minor compiler concern for this proposal: The proposed library implementation of

    #[inline(always)]
    pub fn cold_path() {
        std::intrinsics::cold_path()
    }

given the current compiler implementation, relies on hint::cold_path being inlined in MIR in order to work.

While technically this is a hint so it's fine to do nothing, and branch hints are unlikely to be useful in compilations that disable the MIR inliner, it would be a bummer if this got stabilized, then later we figured out a better way to do branch hints and had to deprecate this one.

Perhaps the reliance on the MIR inliner will be alleviated by rust-lang/rust#134082 but at the time of writing that PR isn't even merged. It might be by the time this ACP gets discussed, who knows.

@programmerjake
Copy link
Member

if we run into compiler issues, the cold_path wrapper could always end up becoming:

#[cold]
pub fn cold_path() {}

this should work since it is entirely stable code now, even if less optimal.

@hanna-kruppe
Copy link

Hashbrown without nightly feature had such a formulation for a while but removed it because it didn’t work. I guess it has the opposite problem: rather than relying on MIR inlining, it’s too easily optimized out (by the inliner or other passes) before it can influence branch weights.

@x17jiri
Copy link
Author

x17jiri commented Dec 23, 2024

@hanna-kruppe Before rust-lang/rust#120370 was merged, such a formulation would not work.

rustc doesn't inline a cold function regardless how small it is. But LLVM does and doesn't set the branch weights.

Now, there is code that detects calls to cold functions and sets the weights.

@hanna-kruppe
Copy link

I have not reviewed that PR but I wonder: if the formulation with a #[cold] function works now, why did that PR add a new intrinsic?

@x17jiri
Copy link
Author

x17jiri commented Dec 23, 2024

@hanna-kruppe Intrinsic gives us more control. Without it, the functionality depends on two assumptions:

  1. rustc inline pass doesn't inline a cold function, so it can be detected later in the code gen
  2. llvm inline pass will inline it so the binary doesn't contain actual call to cold_path()

At the moment both assumptions hold.

With intrinsic, however, we can make sure that future updates will not break one or both of them.

  1. rustc inline pass cannot inline an instrinsic because it doesn't know how it will be implemented in the backend
  2. llvm pass doesn't need to inline it because we can remove it during codegen

@hanna-kruppe
Copy link

Thanks for explaining. Sounds like the slightly generalized version of @saethlin's concern, that hint::cold_path only works reliably by relying on on implementation details of the optimization/codegen, also applies to the intrinsic-less approach?

@x17jiri
Copy link
Author

x17jiri commented Dec 23, 2024

The intrinsic-less version depends on whether a tiny function marked as cold gets inlined. I would call this implementation detail.

The export of intrinsic in std::hint depends on whether a function marked #[inline(always)] gets inlined. I would NOT call this implementation detail.

But yes, it's a valid concern and hopefully rust-lang/rust#134082 will fix it.

@mu001999
Copy link

Can we have hot_path? Then we can have:

if cond {
    hot_path();
    true_branch
}

@x17jiri
Copy link
Author

x17jiri commented Dec 30, 2024

@mu001999 My concern with hot_path() is this. Maybe someone with more knowledge of optimization passes can tell us it would be ok.

The current implementation of cold_path() leaves the cold_path instruction in MIR until all MIR passes finish. Then in the codegen, it is detected and removed. Could the presence of this simple no-op prevent some MIR optimizations? I don't really know the answer to that, but if yes, it would be bigger deal for hot path than it is for cold path.

@the8472
Copy link
Member

the8472 commented Jan 7, 2025

Do unlikely, #[cold] and cold_path() all optimize the same and we can tell users that they're interchangeable or are there subtle differences?

@x17jiri
Copy link
Author

x17jiri commented Jan 7, 2025

@the8472

unlikely is implemented like this:

#[inline(always)]
pub const fn unlikely(b: bool) -> bool {
    if b {
        cold_path();
        true
    } else {
        false
    }
}

So it gets rewritten to cold_path().

#[cold] on match arms is not implemented at the moment and no decision has been made whether it should be added to the language. But if it is added in the future, it would also be rewritten to cold_path().

match x {
    1 => a,
    #[cold] 2 => b,
}

would be rewritten by the compiler to:

match x {
    1 => a,
    2 => { cold_path(); b },
}

@the8472
Copy link
Member

the8472 commented Jan 7, 2025

Ok, one of my concerns was that we have a hodgepodge of different ways of doing this, which includes #[cold] on functions and once statement attributes is stabilized also on closures. If they all optimize the same and we can tell users that they're interchangable that is mitigated.

@tgross35
Copy link

tgross35 commented Jan 7, 2025

I think that if we accept cold_path, we shouldn't add more uses of #[cold]. Saying unambiguously "#[cold] works only on functions, use cold_path() within control flow or closures" sounds better than providing both ways, and it is less surface area.

Of course it is too late to change

#[cold]
fn foo() {
    cold_path();
    /* ... */
}

but that seems fine.

On that note, cc @rust-lang/lang since, as mentioned, this interacts with the #[cold] lang features.

@x17jiri
Copy link
Author

x17jiri commented Jan 7, 2025

I agree that more uses of #[cold] are unnecessary

Regarding cold_path() vs #[cold], we can also say that calling any #[cold] function in a control flow statement will mark the corresponding branch as cold. And cold_path() is just a special #[cold] function that gets dropped in the code gen so there is no actual call.

@hanna-kruppe
Copy link

A call to a #[cold] function implies the caller's control flow leading up to the call is a cold_path. Leveraging this to define cold_path() as an ordinary non-intrinsic function is somewhat brittle because it only works well if the call survives just long enough to influence branch weights and is eliminated ASAP afterwards.

In the opposite direction, a call site being in a cold_path is a weaker statement than the callee being #[cold]. A #[cold] function is cold everywhere, but some functions have a mix of hot and cold call sites. Knowing that all calls to a function are cold could be used to justify changing the calling convention (this is not currently done, for good reasons I assume, but it shouldn't be ruled out entirely).

Conversely, it's probably reasonable to infer that a function should be #[cold] if its body starts with cold_path(). In the limit this implies a global (as in, interprocedural) fixpoint analysis on MIR, which is... not without downsides. I also don't think rustc does this right now (LLVM might, but if so, it needs LTO to be effective).

@Amanieu
Copy link
Member

Amanieu commented Jan 7, 2025

We discussed this in the libs-api meeting and we're happy to add both cold_path and likely/unlikely to std::hint as unstable. This will allow people to experiment with them with real world code and allow us to evaluate how suitable it is for stabilization.

@Amanieu Amanieu closed this as completed Jan 7, 2025
@Amanieu Amanieu added the ACP-accepted API Change Proposal is accepted (seconded with no objections) label Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

8 participants