Skip to content

Commit

Permalink
excise thread_local guard in SingletonRelaxedCounter fast-path
Browse files Browse the repository at this point in the history
Summary:
Somehow, the compiler hallucinates the need for a tls guard variable to guard the `thread_local` variable of type `CounterAndCache`. In the past, this type was expected to be trivial. But, with the change to `std::atomic` in C++20 to make it no longer trivial, the compiler is giving this `thread_local` variable a tls guard variable which it manifestly does not need. This affects the inline fast path.

The immediate solution is to change the counter types from `std::atomic<Int>` to `Int` and to use `std::atomic_ref` backported as `folly::atomic_ref` to access the counters. This is unsatisfactory because this removes the type-safety provided by `std::atomic`; but we must have the optimal inline fast path.

Differential Revision: D59144047

fbshipit-source-id: f1bd0746d7bd8ae8963c2a67bb62a3608c9e09fc
  • Loading branch information
yfeldblum authored and facebook-github-bot committed Jun 29, 2024
1 parent ea265a8 commit 8885811
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 7 deletions.
1 change: 1 addition & 0 deletions folly/concurrency/BUCK
Original file line number Diff line number Diff line change
Expand Up @@ -167,5 +167,6 @@ cpp_library(
"//folly:utility",
"//folly/detail:static_singleton_manager",
"//folly/detail:thread_local_globals",
"//folly/synchronization:atomic_ref",
],
)
18 changes: 11 additions & 7 deletions folly/concurrency/SingletonRelaxedCounter.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
#include <folly/Utility.h>
#include <folly/detail/StaticSingletonManager.h>
#include <folly/detail/thread_local_globals.h>
#include <folly/synchronization/AtomicRef.h>

namespace folly {

Expand All @@ -45,12 +46,13 @@ template <typename Int>
class SingletonRelaxedCounterBase {
protected:
using Signed = std::make_signed_t<Int>;
using Counter = std::atomic<Signed>;
using Counter = Signed; // should be atomic but clang generates worse code

struct CounterAndCache {
Counter counter; // valid during LocalLifetime object lifetime
Counter* cache; // points to counter when counter is valid
};
static_assert(std::is_trivial_v<CounterAndCache>);

struct CounterRefAndLocal {
Counter* counter; // refers either to local counter or to global counter
Expand Down Expand Up @@ -99,15 +101,17 @@ class SingletonRelaxedCounterBase {
}
FOLLY_NOINLINE void destroy_(GetGlobal& get_global) {
auto& global = get_global();
auto global_fallback = atomic_ref(global.fallback);
auto const tracking = global.tracking.wlock();
auto& lifetimes = tracking->lifetimes[this];
for (auto ctr : lifetimes) {
auto const it = tracking->locals.find(ctr);
if (!--it->second) {
tracking->locals.erase(it);
auto const current = ctr->counter.load(std::memory_order_relaxed);
global.fallback.fetch_add(current, std::memory_order_relaxed);
ctr->counter.store(Signed(0), std::memory_order_relaxed);
auto ctr_counter = atomic_ref(ctr->counter);
auto const current = ctr_counter.load(std::memory_order_relaxed);
global_fallback.fetch_add(current, std::memory_order_relaxed);
ctr_counter.store(Signed(0), std::memory_order_relaxed);
ctr->cache = nullptr;
}
}
Expand All @@ -130,18 +134,18 @@ class SingletonRelaxedCounterBase {
}
FOLLY_NOINLINE static Int aggregate_(GetGlobal& get_global) {
auto& global = get_global();
auto count = global.fallback.load(std::memory_order_relaxed);
auto count = atomic_ref(global.fallback).load(std::memory_order_relaxed);
auto const tracking = global.tracking.rlock();
for (auto const& kvp : tracking->locals) {
count += kvp.first->counter.load(std::memory_order_relaxed);
count += atomic_ref(kvp.first->counter).load(std::memory_order_relaxed);
}
return std::is_unsigned<Int>::value
? to_unsigned(std::max(Signed(0), count))
: count;
}

FOLLY_ERASE static void mutate(Signed v, CounterRefAndLocal cl) {
auto& c = *cl.counter;
auto c = atomic_ref(*cl.counter);
if (cl.local) {
// splitting load/store on the local counter is faster than fetch-and-add
c.store(c.load(std::memory_order_relaxed) + v, std::memory_order_relaxed);
Expand Down

0 comments on commit 8885811

Please sign in to comment.