Skip to content

Commit

Permalink
aarch64: Use .arch_extension directive instead of #[target_feature]
Browse files Browse the repository at this point in the history
  • Loading branch information
taiki-e committed Oct 5, 2023
1 parent a7daa73 commit f2002d4
Show file tree
Hide file tree
Showing 4 changed files with 49 additions and 29 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ RUSTFLAGS="--cfg portable_atomic_no_outline_atomics" cargo ...
If dynamic dispatching by run-time CPU feature detection is enabled, it allows maintaining support for older CPUs while using features that are not supported on older CPUs, such as CMPXCHG16B (x86_64) and FEAT_LSE (aarch64).

Note:
- Dynamic detection is currently only enabled in Rust 1.61+ for aarch64, in Rust 1.59+ (AVX) or 1.69+ (CMPXCHG16B) for x86_64, nightly only for powerpc64 (disabled by default), otherwise it works the same as when this cfg is set.
- Dynamic detection is currently only enabled in Rust 1.59+ for aarch64, in Rust 1.59+ (AVX) or 1.69+ (CMPXCHG16B) for x86_64, nightly only for powerpc64 (disabled by default), otherwise it works the same as when this cfg is set.
- If the required target features are enabled at compile-time, the atomic operations are inlined.
- This is compatible with no-std (as with all features except `std`).
- On some targets, run-time detection is disabled by default mainly for compatibility with older versions of operating systems or incomplete build environments, and can be enabled by `--cfg portable_atomic_outline_atomics`. (When both cfg are enabled, `*_no_*` cfg is preferred.)
Expand Down
13 changes: 0 additions & 13 deletions build.rs
Original file line number Diff line number Diff line change
Expand Up @@ -192,19 +192,6 @@ fn main() {
target_feature_if("cmpxchg16b", has_cmpxchg16b, &version, Some(69), true);
}
"aarch64" => {
// aarch64_target_feature stabilized in Rust 1.61 (nightly-2022-03-16): https://github.com/rust-lang/rust/pull/90621
if !version.probe(61, 2022, 3, 15) {
if version.nightly && is_allowed_feature("aarch64_target_feature") {
// The part of this feature we use has not been changed since 1.27
// (https://github.com/rust-lang/rust/commit/1217d70465edb2079880347fea4baaac56895f51)
// until it was stabilized in nightly-2022-03-16, so it can be safely enabled in
// nightly, which is older than nightly-2022-03-16.
println!("cargo:rustc-cfg=portable_atomic_unstable_aarch64_target_feature");
} else {
// On aarch64, when aarch64_target_feature is not available, outline-atomics is also not available.
println!("cargo:rustc-cfg=portable_atomic_no_outline_atomics");
}
}
// For Miri and ThreadSanitizer.
// https://github.com/rust-lang/rust/pull/97423 merged in Rust 1.64 (nightly-2022-06-30).
if version.nightly && version.probe(64, 2022, 6, 29) {
Expand Down
52 changes: 47 additions & 5 deletions src/imp/atomic128/aarch64.rs
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,45 @@ macro_rules! debug_assert_lse {
};
}

// Refs: https://developer.arm.com/documentation/100067/0612/armclang-Integrated-Assembler/AArch32-Target-selection-directives?lang=en
//
// This is similar to #[target_feature(enable = "lse")], except that there are
// no compiler guarantees regarding (un)inlining, and the scope is within an asm
// block rather than a function. We use this directive to support outline-atomics
// on pre-1.61 rustc (aarch64_target_feature stabilized in Rust 1.61).
//
// The .arch_extension directive is effective until the end of the assembly block and
// is not propagated to subsequent code, so the end_lse macro is unneeded.
// https://godbolt.org/z/4oMEW8vWc
// https://github.com/torvalds/linux/commit/e0d5896bd356cd577f9710a02d7a474cdf58426b
// https://github.com/torvalds/linux/commit/dd1f6308b28edf0452dd5dc7877992903ec61e69
// (It seems GCC effectively ignores this directive and always allow FEAT_LSE instructions: https://godbolt.org/z/W9W6rensG)
//
// The .arch directive has a similar effect, but we don't use it due to the following issue:
// https://github.com/torvalds/linux/commit/dd1f6308b28edf0452dd5dc7877992903ec61e69
//
// Note: If FEAT_LSE is not available at compile-time, we must guarantee that
// the function that uses it is not inlined into a function where it is not
// clear whether FEAT_LSE is available. Otherwise, (even if we checked whether
// FEAT_LSE is available at run-time) optimizations that reorder its
// instructions across the if condition might introduce undefined behavior.
// (see also https://rust-lang.github.io/rfcs/2045-target-feature.html#safely-inlining-target_feature-functions-on-more-contexts)
// However, our code uses the ifunc helper macro that works with function pointers,
// so we don't have to worry about this unless calling without helper macro.
#[cfg(not(any(target_feature = "lse", portable_atomic_target_feature = "lse")))]
#[cfg(not(portable_atomic_no_outline_atomics))]
macro_rules! start_lse {
() => {
".arch_extension lse"
};
}
#[cfg(any(target_feature = "lse", portable_atomic_target_feature = "lse"))]
macro_rules! start_lse {
() => {
""
};
}

#[cfg(target_endian = "little")]
macro_rules! select_le_or_be {
($le:expr, $be:expr) => {
Expand Down Expand Up @@ -289,6 +328,7 @@ unsafe fn _atomic_load_casp(src: *mut u128, order: Ordering) -> u128 {
macro_rules! atomic_load {
($acquire:tt, $release:tt) => {
asm!(
start_lse!(),
concat!("casp", $acquire, $release, " x2, x3, x2, x3, [{src}]"),
src = in(reg) ptr_reg!(src),
// must be allocated to even/odd register pair
Expand Down Expand Up @@ -551,7 +591,9 @@ unsafe fn atomic_compare_exchange(
#[cfg(not(any(target_feature = "lse", portable_atomic_target_feature = "lse")))]
let prev = {
fn_alias! {
#[target_feature(enable = "lse")]
// inline(never) is just a hint and also not strictly necessary
// because we use ifunc helper macro, but used for clarity.
#[inline(never)]
unsafe fn(dst: *mut u128, old: u128, new: u128) -> u128;
atomic_compare_exchange_casp_relaxed
= _atomic_compare_exchange_casp(Ordering::Relaxed, Ordering::Relaxed);
Expand Down Expand Up @@ -660,10 +702,6 @@ unsafe fn atomic_compare_exchange(
portable_atomic_target_feature = "lse",
not(portable_atomic_no_outline_atomics),
))]
#[cfg_attr(
not(any(target_feature = "lse", portable_atomic_target_feature = "lse")),
target_feature(enable = "lse")
)]
#[inline]
unsafe fn _atomic_compare_exchange_casp(
dst: *mut u128,
Expand All @@ -690,6 +728,7 @@ unsafe fn _atomic_compare_exchange_casp(
macro_rules! cmpxchg {
($acquire:tt, $release:tt, $fence:tt) => {
asm!(
start_lse!(),
concat!("casp", $acquire, $release, " x6, x7, x4, x5, [{dst}]"),
$fence,
dst = in(reg) ptr_reg!(dst),
Expand Down Expand Up @@ -848,6 +887,7 @@ unsafe fn _atomic_swap_casp(dst: *mut u128, val: u128, order: Ordering) -> u128
macro_rules! swap {
($acquire:tt, $release:tt, $fence:tt) => {
asm!(
start_lse!(),
// If FEAT_LSE2 is not supported, this works like byte-wise atomic.
// This is not single-copy atomic reads, but this is ok because subsequent
// CAS will check for consistency.
Expand Down Expand Up @@ -1014,6 +1054,7 @@ macro_rules! atomic_rmw_cas_3 {
macro_rules! op {
($acquire:tt, $release:tt, $fence:tt) => {
asm!(
start_lse!(),
// If FEAT_LSE2 is not supported, this works like byte-wise atomic.
// This is not single-copy atomic reads, but this is ok because subsequent
// CAS will check for consistency.
Expand Down Expand Up @@ -1140,6 +1181,7 @@ macro_rules! atomic_rmw_cas_2 {
macro_rules! op {
($acquire:tt, $release:tt, $fence:tt) => {
asm!(
start_lse!(),
// If FEAT_LSE2 is not supported, this works like byte-wise atomic.
// This is not single-copy atomic reads, but this is ok because subsequent
// CAS will check for consistency.
Expand Down
11 changes: 1 addition & 10 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ RUSTFLAGS="--cfg portable_atomic_no_outline_atomics" cargo ...
If dynamic dispatching by run-time CPU feature detection is enabled, it allows maintaining support for older CPUs while using features that are not supported on older CPUs, such as CMPXCHG16B (x86_64) and FEAT_LSE (aarch64).
Note:
- Dynamic detection is currently only enabled in Rust 1.61+ for aarch64, in Rust 1.59+ (AVX) or 1.69+ (CMPXCHG16B) for x86_64, nightly only for powerpc64 (disabled by default), otherwise it works the same as when this cfg is set.
- Dynamic detection is currently only enabled in Rust 1.59+ for aarch64, in Rust 1.59+ (AVX) or 1.69+ (CMPXCHG16B) for x86_64, nightly only for powerpc64 (disabled by default), otherwise it works the same as when this cfg is set.
- If the required target features are enabled at compile-time, the atomic operations are inlined.
- This is compatible with no-std (as with all features except `std`).
- On some targets, run-time detection is disabled by default mainly for compatibility with older versions of operating systems or incomplete build environments, and can be enabled by `--cfg portable_atomic_outline_atomics`. (When both cfg are enabled, `*_no_*` cfg is preferred.)
Expand Down Expand Up @@ -258,20 +258,11 @@ RUSTFLAGS="--cfg portable_atomic_no_outline_atomics" cargo ...
// These features are already stabilized or have already been removed from compilers,
// and can safely be enabled for old nightly as long as version detection works.
// - cfg(target_has_atomic)
// - #[target_feature(enable = "lse")] on AArch64
// - #[target_feature(enable = "cmpxchg16b")] on x86_64
// - asm! on ARM, AArch64, RISC-V, x86_64
// - llvm_asm! on AVR (tier 3) and MSP430 (tier 3)
// - #[instruction_set] on non-Linux/Android pre-v6 ARM (tier 3)
#![cfg_attr(portable_atomic_unstable_cfg_target_has_atomic, feature(cfg_target_has_atomic))]
#![cfg_attr(
all(
target_arch = "aarch64",
portable_atomic_unstable_aarch64_target_feature,
not(portable_atomic_no_outline_atomics),
),
feature(aarch64_target_feature)
)]
#![cfg_attr(
all(
target_arch = "x86_64",
Expand Down

0 comments on commit f2002d4

Please sign in to comment.