-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use BOLT in CI to optimize LLVM #94381
Conversation
(rust-highfive has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
2e6406d
to
a36ab89
Compare
This comment has been minimized.
This comment has been minimized.
FYI, you can test this locally by running To fix the builder failure, there's a couple of options. a) set COMPILER_RT_BUILD_XRAY=OFF, b) update the GCC version used to build clang or c) upstream and backport rust-lang/llvm-project@75fef2e. Option a) is probably easiest. |
Thanks for the hint! I was preparing for upgrading GCC, but this indeed seems easier. |
|
Looks like |
Thanks, I'll try it. GCC 6.5 also can't compile it, trying 7.5 now. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
They want to address the cmake issue: |
3999112
to
0c3c339
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
96d1f05
to
ba3f222
Compare
This comment has been minimized.
This comment has been minimized.
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
Finished benchmarking commit (8dfb407): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Footnotes |
Use BOLT in CI to optimize LLVM This PR adds an optimization step in the Linux `dist` CI pipeline that uses [BOLT](https://github.com/llvm/llvm-project/tree/main/bolt) to optimize the `libLLVM.so` library built by boostrap. Steps: - [x] Use LLVM 15 as a bootstrap compiler and use it to build BOLT - [x] Compile LLVM with support for relocations (`-DCMAKE_SHARED_LINKER_FLAGS="-Wl,-q"`) - [x] Gather profile data using instrumented LLVM - [x] Apply profile to LLVM that has already been PGOfied - [x] Run with BOLT profiling on more benchmarks - [x] Decide on the order of optimization (PGO -> BOLT?) - [x] Decide how we should get `bolt` (currently we use the host `bolt`) - [x] Clean up The latest perf results can be found [here](rust-lang/rust#94381 (comment)). The current CI build time with BOLT applied is around 1h 55 minutes.
Great results here, including a 3.5% bootstrap time reduction. Also a 3.6% max-rss across the board, I guess that must be due to code size shrinking? |
I'm not sure what's the effect, I checked the LLVM lib size against recent |
Perform a bunch of pkglint cleanup while here, and bump bootstrap kits to 1.65.0. Version 1.66.0 (2022-12-15) ========================== Language -------- - [Permit specifying explicit discriminants on all `repr(Int)` enums](rust-lang/rust#95710) ```rust #[repr(u8)] enum Foo { A(u8) = 0, B(i8) = 1, C(bool) = 42, } ``` - [Allow transmutes between the same type differing only in lifetimes](rust-lang/rust#101520) - [Change constant evaluation errors from a deny-by-default lint to a hard error](rust-lang/rust#102091) - [Trigger `must_use` on `impl Trait` for supertraits](rust-lang/rust#102287) This makes `impl ExactSizeIterator` respect the existing `#[must_use]` annotation on `Iterator`. - [Allow `..X` and `..=X` in patterns](rust-lang/rust#102275) - [Uplift `clippy::for_loops_over_fallibles` lint into rustc](rust-lang/rust#99696) - [Stabilize `sym` operands in inline assembly](rust-lang/rust#103168) - [Update to Unicode 15](rust-lang/rust#101912) - [Opaque types no longer imply lifetime bounds](rust-lang/rust#95474) This is a soundness fix which may break code that was erroneously relying on this behavior. Compiler -------- - [Add armv5te-none-eabi and thumbv5te-none-eabi tier 3 targets](rust-lang/rust#101329) - Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support. - [Add support for linking against macOS universal libraries](rust-lang/rust#98736) Libraries --------- - [Fix `#[derive(Default)]` on a generic `#[default]` enum adding unnecessary `Default` bounds](rust-lang/rust#101040) - [Update to Unicode 15](rust-lang/rust#101821) Stabilized APIs --------------- - [`proc_macro::Span::source_text`](https://doc.rust-lang.org/stable/proc_macro/struct.Span.html#method.source_text) - [`uX::{checked_add_signed, overflowing_add_signed, saturating_add_signed, wrapping_add_signed}`](https://doc.rust-lang.org/stable/std/primitive.u8.html#method.checked_add_signed) - [`iX::{checked_add_unsigned, overflowing_add_unsigned, saturating_add_unsigned, wrapping_add_unsigned}`](https://doc.rust-lang.org/stable/std/primitive.i8.html#method.checked_add_unsigned) - [`iX::{checked_sub_unsigned, overflowing_sub_unsigned, saturating_sub_unsigned, wrapping_sub_unsigned}`](https://doc.rust-lang.org/stable/std/primitive.i8.html#method.checked_sub_unsigned) - [`BTreeSet::{first, last, pop_first, pop_last}`](https://doc.rust-lang.org/stable/std/collections/struct.BTreeSet.html#method.first) - [`BTreeMap::{first_key_value, last_key_value, first_entry, last_entry, pop_first, pop_last}`](https://doc.rust-lang.org/stable/std/collections/struct.BTreeMap.html#method.first_key_value) - [Add `AsFd` implementations for stdio lock types on WASI.](rust-lang/rust#101768) - [`impl TryFrom<Vec<T>> for Box<[T; N]>`](https://doc.rust-lang.org/stable/std/boxed/struct.Box.html#impl-TryFrom%3CVec%3CT%2C%20Global%3E%3E-for-Box%3C%5BT%3B%20N%5D%2C%20Global%3E) - [`core::hint::black_box`](https://doc.rust-lang.org/stable/std/hint/fn.black_box.html) - [`Duration::try_from_secs_{f32,f64}`](https://doc.rust-lang.org/stable/std/time/struct.Duration.html#method.try_from_secs_f32) - [`Option::unzip`](https://doc.rust-lang.org/stable/std/option/enum.Option.html#method.unzip) - [`std::os::fd`](https://doc.rust-lang.org/stable/std/os/fd/index.html) Rustdoc ------- - [Add Rustdoc warning for invalid HTML tags in the documentation](rust-lang/rust#101720) Cargo ----- - [Added `cargo remove` to remove dependencies from Cargo.toml](https://doc.rust-lang.org/nightly/cargo/commands/cargo-remove.html) - [`cargo publish` now waits for the new version to be downloadable before exiting](rust-lang/cargo#11062) See [detailed release notes](https://github.com/rust-lang/cargo/blob/master/CHANGELOG.md#cargo-166-2022-12-15) for more. Compatibility Notes ------------------- - [Only apply `ProceduralMasquerade` hack to older versions of `rental`](rust-lang/rust#94063) - [Don't export `__heap_base` and `__data_end` on wasm32-wasi.](rust-lang/rust#102385) - [Don't export `__wasm_init_memory` on WebAssembly.](rust-lang/rust#102426) - [Only export `__tls_*` on wasm32-unknown-unknown.](rust-lang/rust#102440) - [Don't link to `libresolv` in libstd on Darwin](rust-lang/rust#102766) - [Update libstd's libc to 0.2.135 (to make `libstd` no longer pull in `libiconv.dylib` on Darwin)](rust-lang/rust#103277) - [Opaque types no longer imply lifetime bounds](rust-lang/rust#95474) This is a soundness fix which may break code that was erroneously relying on this behavior. - [Make `order_dependent_trait_objects` show up in future-breakage reports](rust-lang/rust#102635) - [Change std::process::Command spawning to default to inheriting the parent's signal mask](rust-lang/rust#101077) Internal Changes ---------------- These changes do not affect any public interfaces of Rust, but they represent significant improvements to the performance or internals of rustc and related tools. - [Enable BOLT for LLVM compilation](rust-lang/rust#94381) - [Enable LTO for rustc_driver.so](rust-lang/rust#101403)
Pkgsrc changes: * pkglint cleanups, bump bootstrap kits to 1.65.0. * New target: mipsel-unknown-netbsd, for cpu=mips32 with soft-float. * Managed to retain the build of aarch64_be, llvm needed a patch to avoid use of neon instructions in the BE case (llvm doesn't support use of neon in BE mode). Ref. patch to src/llvm-project/llvm/lib/Support/BLAKE3/blake3_impl.h. Also submitted upstream of LLVM to the BLAKE3 maintainers. * The minimum gcc version is now 7.x, and that includes the cross-compiler for the targets. For i386 this also needs to /usr/include/gcc-7 include files in the target root, because immintrin.h from gcc 5 is not compatible with gcc 7.x. This applies for the targets where we build against a root from netbsd-8 (sparc64, powerpc, i386), and files/gcc-wrap gets a hack for this. * Pick up tweak for -latomic inclusion from rust-lang/rust#104220 and rust-lang/rust#104572 * Retain ability to do 32-bit NetBSD, by changing from 64 to 32 bit types in library/std/src/sys/unix/thread_parker/netbsd.rs. * I've tried to get the "openssl-src" build with -latomic where it's needed. I've introduced the "NetBSD-generic32" system type and use it for the NetBSD mipsel target. There is another attempt to do the same in the patch to vendor/openssl-sys/build/main.rs. Upstream changes: Version 1.66.1 (2023-01-10) =========================== - Added validation of SSH host keys for git URLs in Cargo ([CVE-2022-46176](https://www.cve.org/CVERecord?id=CVE-2022-46176)) Version 1.66.0 (2022-12-15) =========================== Language -------- - [Permit specifying explicit discriminants on all `repr(Int)` enums](rust-lang/rust#95710) ```rust #[repr(u8)] enum Foo { A(u8) = 0, B(i8) = 1, C(bool) = 42, } ``` - [Allow transmutes between the same type differing only in lifetimes](rust-lang/rust#101520) - [Change constant evaluation errors from a deny-by-default lint to a hard error](rust-lang/rust#102091) - [Trigger `must_use` on `impl Trait` for supertraits](rust-lang/rust#102287) This makes `impl ExactSizeIterator` respect the existing `#[must_use]` annotation on `Iterator`. - [Allow `..X` and `..=X` in patterns](rust-lang/rust#102275) - [Uplift `clippy::for_loops_over_fallibles` lint into rustc](rust-lang/rust#99696) - [Stabilize `sym` operands in inline assembly](rust-lang/rust#103168) - [Update to Unicode 15](rust-lang/rust#101912) - [Opaque types no longer imply lifetime bounds](rust-lang/rust#95474) This is a soundness fix which may break code that was erroneously relying on this behavior. Compiler -------- - [Add armv5te-none-eabi and thumbv5te-none-eabi tier 3 targets](rust-lang/rust#101329) - Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support. - [Add support for linking against macOS universal libraries](rust-lang/rust#98736) Libraries --------- - [Fix `#[derive(Default)]` on a generic `#[default]` enum adding unnecessary `Default` bounds](rust-lang/rust#101040) - [Update to Unicode 15](rust-lang/rust#101821) Stabilized APIs --------------- - [`proc_macro::Span::source_text`](https://doc.rust-lang.org/stable/proc_macro/struct.Span.html#method.source_text) - [`uX::{checked_add_signed, overflowing_add_signed, saturating_add_signed, wrapping_add_signed}`](https://doc.rust-lang.org/stable/std/primitive.u8.html#method.checked_add_signed) - [`iX::{checked_add_unsigned, overflowing_add_unsigned, saturating_add_unsigned, wrapping_add_unsigned}`](https://doc.rust-lang.org/stable/std/primitive.i8.html#method.checked_add_unsigned) - [`iX::{checked_sub_unsigned, overflowing_sub_unsigned, saturating_sub_unsigned, wrapping_sub_unsigned}`](https://doc.rust-lang.org/stable/std/primitive.i8.html#method.checked_sub_unsigned) - [`BTreeSet::{first, last, pop_first, pop_last}`](https://doc.rust-lang.org/stable/std/collections/struct.BTreeSet.html#method.first) - [`BTreeMap::{first_key_value, last_key_value, first_entry, last_entry, pop_first, pop_last}`](https://doc.rust-lang.org/stable/std/collections/struct.BTreeMap.html#method.first_key_value) - [Add `AsFd` implementations for stdio lock types on WASI.](rust-lang/rust#101768) - [`impl TryFrom<Vec<T>> for Box<[T; N]>`](https://doc.rust-lang.org/stable/std/boxed/struct.Box.html#impl-TryFrom%3CVec%3CT%2C%20Global%3E%3E-for-Box%3C%5BT%3B%20N%5D%2C%20Global%3E) - [`core::hint::black_box`](https://doc.rust-lang.org/stable/std/hint/fn.black_box.html) - [`Duration::try_from_secs_{f32,f64}`](https://doc.rust-lang.org/stable/std/time/struct.Duration.html#method.try_from_secs_f32) - [`Option::unzip`](https://doc.rust-lang.org/stable/std/option/enum.Option.html#method.unzip) - [`std::os::fd`](https://doc.rust-lang.org/stable/std/os/fd/index.html) Rustdoc ------- - [Add Rustdoc warning for invalid HTML tags in the documentation](rust-lang/rust#101720) Cargo ----- - [Added `cargo remove` to remove dependencies from Cargo.toml](https://doc.rust-lang.org/nightly/cargo/commands/cargo-remove.html) - [`cargo publish` now waits for the new version to be downloadable before exiting](rust-lang/cargo#11062) See [detailed release notes](https://github.com/rust-lang/cargo/blob/master/CHANGELOG.md#cargo-166-2022-12-15) for more. Compatibility Notes ------------------- - [Only apply `ProceduralMasquerade` hack to older versions of `rental`](rust-lang/rust#94063) - [Don't export `__heap_base` and `__data_end` on wasm32-wasi.](rust-lang/rust#102385) - [Don't export `__wasm_init_memory` on WebAssembly.](rust-lang/rust#102426) - [Only export `__tls_*` on wasm32-unknown-unknown.](rust-lang/rust#102440) - [Don't link to `libresolv` in libstd on Darwin](rust-lang/rust#102766) - [Update libstd's libc to 0.2.135 (to make `libstd` no longer pull in `libiconv.dylib` on Darwin)](rust-lang/rust#103277) - [Opaque types no longer imply lifetime bounds](rust-lang/rust#95474) This is a soundness fix which may break code that was erroneously relying on this behavior. - [Make `order_dependent_trait_objects` show up in future-breakage reports](rust-lang/rust#102635) - [Change std::process::Command spawning to default to inheriting the parent's signal mask](rust-lang/rust#101077) Internal Changes ---------------- These changes do not affect any public interfaces of Rust, but they represent significant improvements to the performance or internals of rustc and related tools. - [Enable BOLT for LLVM compilation](rust-lang/rust#94381) - [Enable LTO for rustc_driver.so](rust-lang/rust#101403) Version 1.65.0 (2022-11-03) ========================== Language -------- - [Error on `as` casts of enums with `#[non_exhaustive]` variants] (rust-lang/rust#92744) - [Stabilize `let else`](rust-lang/rust#93628) - [Stabilize generic associated types (GATs)] (rust-lang/rust#96709) - [Add lints `let_underscore_drop`, `let_underscore_lock`, and `let_underscore_must_use` from Clippy] (rust-lang/rust#97739) - [Stabilize `break`ing from arbitrary labeled blocks ("label-break-value")] (rust-lang/rust#99332) - [Uninitialized integers, floats, and raw pointers are now considered immediate UB](rust-lang/rust#98919). Usage of `MaybeUninit` is the correct way to work with uninitialized memory. - [Stabilize raw-dylib for Windows x86_64, aarch64, and thumbv7a] (rust-lang/rust#99916) - [Do not allow `Drop` impl on foreign ADTs] (rust-lang/rust#99576) Compiler -------- - [Stabilize -Csplit-debuginfo on Linux] (rust-lang/rust#98051) - [Use niche-filling optimization even when multiple variants have data] (rust-lang/rust#94075) - [Associated type projections are now verified to be well-formed prior to resolving the underlying type] (rust-lang/rust#99217) - [Stringify non-shorthand visibility correctly] (rust-lang/rust#100350) - [Normalize struct field types when unsizing] (rust-lang/rust#101831) - [Update to LLVM 15](rust-lang/rust#99464) - [Fix aarch64 call abi to correctly zeroext when needed] (rust-lang/rust#97800) - [debuginfo: Generalize C++-like encoding for enums] (rust-lang/rust#98393) - [Add `special_module_name` lint] (rust-lang/rust#94467) - [Add support for generating unique profraw files by default when using `-C instrument-coverage`] (rust-lang/rust#100384) - [Allow dynamic linking for iOS/tvOS targets] (rust-lang/rust#100636) New targets: - [Add armv4t-none-eabi as a tier 3 target] (rust-lang/rust#100244) - [Add powerpc64-unknown-openbsd and riscv64-unknown-openbsd as tier 3 targets] (rust-lang/rust#101025) - Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support. Libraries --------- - [Don't generate `PartialEq::ne` in derive(PartialEq)] (rust-lang/rust#98655) - [Windows RNG: Use `BCRYPT_RNG_ALG_HANDLE` by default] (rust-lang/rust#101325) - [Forbid mixing `System` with direct system allocator calls] (rust-lang/rust#101394) - [Document no support for writing to non-blocking stdio/stderr] (rust-lang/rust#101416) - [`std::layout::Layout` size must not overflow `isize::MAX` when rounded up to `align`](rust-lang/rust#95295) This also changes the safety conditions on `Layout::from_size_align_unchecked`. Stabilized APIs --------------- - [`std::backtrace::Backtrace`] (https://doc.rust-lang.org/stable/std/backtrace/struct.Backtrace.html) - [`Bound::as_ref`] (https://doc.rust-lang.org/stable/std/ops/enum.Bound.html#method.as_ref) - [`std::io::read_to_string`] (https://doc.rust-lang.org/stable/std/io/fn.read_to_string.html) - [`<*const T>::cast_mut`] (https://doc.rust-lang.org/stable/std/primitive.pointer.html#method.cast_mut) - [`<*mut T>::cast_const`] (https://doc.rust-lang.org/stable/std/primitive.pointer.html#method.cast_const) These APIs are now stable in const contexts: - [`<*const T>::offset_from`] (https://doc.rust-lang.org/stable/std/primitive.pointer.html#method.offset_from) - [`<*mut T>::offset_from`] (https://doc.rust-lang.org/stable/std/primitive.pointer.html#method.offset_from) Cargo ----- - [Apply GitHub fast path even for partial hashes] (rust-lang/cargo#10807) - [Do not add home bin path to PATH if it's already there] (rust-lang/cargo#11023) - [Take priority into account within the pending queue] (rust-lang/cargo#11032). This slightly optimizes job scheduling by Cargo, with typically small improvements on larger crate graph builds. Compatibility Notes ------------------- - [`std::layout::Layout` size must not overflow `isize::MAX` when rounded up to `align`] (rust-lang/rust#95295). This also changes the safety conditions on `Layout::from_size_align_unchecked`. - [`PollFn` now only implements `Unpin` if the closure is `Unpin`] (rust-lang/rust#102737). This is a possible breaking change if users were relying on the blanket unpin implementation. See discussion on the PR for details of why this change was made. - [Drop ExactSizeIterator impl from std::char::EscapeAscii] (rust-lang/rust#99880) This is a backwards-incompatible change to the standard library's surface area, but is unlikely to affect real world usage. - [Do not consider a single repeated lifetime eligible for elision in the return type] (rust-lang/rust#103450) This behavior was unintentionally changed in 1.64.0, and this release reverts that change by making this an error again. - [Reenable disabled early syntax gates as future-incompatibility lints] (rust-lang/rust#99935) - [Update the minimum external LLVM to 13] (rust-lang/rust#100460) - [Don't duplicate file descriptors into stdio fds] (rust-lang/rust#101426) - [Sunset RLS](rust-lang/rust#100863) - [Deny usage of `#![cfg_attr(..., crate_type = ...)]` to set the crate type] (rust-lang/rust#99784) This strengthens the forward compatibility lint deprecated_cfg_attr_crate_type_name to deny. - [`llvm-has-rust-patches` allows setting the build system to treat the LLVM as having Rust-specific patches] (rust-lang/rust#101072) This option may need to be set for distributions that are building Rust with a patched LLVM via `llvm-config`, not the built-in LLVM. Internal Changes ---------------- These changes do not affect any public interfaces of Rust, but they represent significant improvements to the performance or internals of rustc and related tools. - [Add `x.sh` and `x.ps1` shell scripts] (rust-lang/rust#99992) - [compiletest: use target cfg instead of hard-coded tables] (rust-lang/rust#100260) - [Use object instead of LLVM for reading bitcode from rlibs] (rust-lang/rust#98100) - [Enable MIR inlining for optimized compilations] (rust-lang/rust#91743) This provides a 3-10% improvement in compiletimes for real world crates. See [perf results] (https://perf.rust-lang.org/compare.html?start=aedf78e56b2279cc869962feac5153b6ba7001ed&end=0075bb4fad68e64b6d1be06bf2db366c30bc75e1&stat=instructions:u).
Use BOLT in CI to optimize `librustc_driver` Based on rust-lang#94381. r? `@ghost`
Use BOLT in CI to optimize `librustc_driver` Based on rust-lang#94381. r? `@ghost`
Use BOLT in CI to optimize LLVM This PR adds an optimization step in the Linux `dist` CI pipeline that uses [BOLT](https://github.com/llvm/llvm-project/tree/main/bolt) to optimize the `libLLVM.so` library built by boostrap. Steps: - [x] Use LLVM 15 as a bootstrap compiler and use it to build BOLT - [x] Compile LLVM with support for relocations (`-DCMAKE_SHARED_LINKER_FLAGS="-Wl,-q"`) - [x] Gather profile data using instrumented LLVM - [x] Apply profile to LLVM that has already been PGOfied - [x] Run with BOLT profiling on more benchmarks - [x] Decide on the order of optimization (PGO -> BOLT?) - [x] Decide how we should get `bolt` (currently we use the host `bolt`) - [x] Clean up The latest perf results can be found [here](rust-lang/rust#94381 (comment)). The current CI build time with BOLT applied is around 1h 55 minutes.
Use BOLT in CI to optimize LLVM This PR adds an optimization step in the Linux `dist` CI pipeline that uses [BOLT](https://github.com/llvm/llvm-project/tree/main/bolt) to optimize the `libLLVM.so` library built by boostrap. Steps: - [x] Use LLVM 15 as a bootstrap compiler and use it to build BOLT - [x] Compile LLVM with support for relocations (`-DCMAKE_SHARED_LINKER_FLAGS="-Wl,-q"`) - [x] Gather profile data using instrumented LLVM - [x] Apply profile to LLVM that has already been PGOfied - [x] Run with BOLT profiling on more benchmarks - [x] Decide on the order of optimization (PGO -> BOLT?) - [x] Decide how we should get `bolt` (currently we use the host `bolt`) - [x] Clean up The latest perf results can be found [here](rust-lang/rust#94381 (comment)). The current CI build time with BOLT applied is around 1h 55 minutes.
This PR adds an optimization step in the Linux
dist
CI pipeline that uses BOLT to optimize thelibLLVM.so
library built by bootstrap.Steps:
-DCMAKE_SHARED_LINKER_FLAGS="-Wl,-q"
)bolt
(currently we use the hostbolt
)The latest perf results can be found here. The current CI build time with BOLT applied is around 1h 55 minutes.