From b95ed0ee7189b6300ea4fc17d3f151adf8d6e58b Mon Sep 17 00:00:00 2001 From: Yedidya Feldblum Date: Tue, 3 Dec 2024 23:12:21 -0800 Subject: [PATCH] use F14VectorSet in ThreadLocalStatsT to optimize aggregate Summary: `ThreadLocalStatsT::aggregate()` iterates the set of stats. The fastest type of container to loop over is a contiguous container such as vector-set, which `F14VectorSet` is. Modifications to the set are rare and their costs are marginal. Iteration is infrequent but occurs on a schedule and the cost of the iteration scales with the size of the set. When `reset()`ing the stats in the set within `aggregate()`, the cost of each `reset()` can dwarf the cost of the `iterator::operator++()`. But if the `reset()` checks whether there have been any updates to extract before doing the extraction, as `TLStatsThreadSafe::TimeSeriesType` does, then the cost of a `reset()` which does nothing is likely comparable to the cost of the `iterator::operator++()` to advance to that stat. There are cases of applications with many outstanding stat objects in the map, which are mostly dead. For example, when we have a sharded application where each process owns a set of shards, but where shard motion between processes is frequent, and where there are many per-shard counters. In such cases, optimizing the cpu cost of `iterator::operator++()` can be beneficial, and this is done by selecting a vector-set. Reviewed By: a-square Differential Revision: D66591772 fbshipit-source-id: 15cf7b4c710dae0fbdf3f95bc2aca670d66669c9 --- fb303/ThreadLocalStats.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fb303/ThreadLocalStats.h b/fb303/ThreadLocalStats.h index fb19bdda5..f6e4b373b 100644 --- a/fb303/ThreadLocalStats.h +++ b/fb303/ThreadLocalStats.h @@ -289,7 +289,7 @@ class ThreadLocalStatsT { * link_->mutex protects access to tlStats_ (when LockTraits actually * provides thread-safety guarantees). */ - folly::F14FastSet tlStats_; + folly::F14VectorSet tlStats_; friend class TLStatT; friend class detail::TLStatLink;