-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memmetrics: simplify locking and solve data race #209
base: master
Are you sure you want to change the base?
Conversation
This package was updating counters concurrently because of incorrect locking. There's not really a reason to have two separate locks or try to optimize with RW locks. This change replaces the two RW locks with one exclusive lock. The RTMetrics struct methods always acquire this exclusive lock. This is simple and will be easier to keep correct as the code evolves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
On a 3ghz intel CPU we do zero allocs and record takes 255ns.
Here is the performance before/after with the benchmark I added. We went from 280ns -> 255ns on my intel 3ghz CPU. We got a little faster by switching from RWMutex to Mutex. The benchmark I added isn't concurrent. But this is sufficient evidence that we're not introducing any performance issue with this change. 255ns means we can lock around 11.7 million times per second which is much higher than any HTTP request rates. Before this change.
After this change.
|
Maybe add a concurrent benchmark ? func BenchmarkRecordConcurrently(b *testing.B) {
b.ReportAllocs()
rr, err := NewRTMetrics(RTClock(testutils.GetClock()))
require.NoError(b, err)
// warm up metrics. Adding a new code can do allocations, but in the steady
// state recording a code is cheap. We want to measure the steady state.
const codes = 100
for code := 0; code < codes; code++ {
rr.Record(code, time.Second)
}
concurrency := runtime.NumCPU() * 2
wg := sync.WaitGroup{}
wg.Add(concurrency)
b.ResetTimer()
for i := 0; i < concurrency; i++ {
go func() {
for j := 0; j < b.N; j++ {
rr.Record(j%codes, time.Second)
}
wg.Done()
}()
}
wg.Wait()
} |
bee0f61
to
b494329
Compare
Our IT guys upgraded my CPU and I didn't even know it. I have 10 cores 20 threads at 3.3Ghz now. (the previous benchmarks were also on this CPU) Good idea to add a concurrent benchmark because the context in which this package will be used should be highly concurrent. I switched the benchmark so that b.N is divided among goroutines so we can do more of an apples to apples comparison. This makes it so the same number of results are recorded in the concurrent and non-concurrent test. I also set concurrency (number of goroutines in the benchmark) equal to the number of CPUs (hardware threads). The goroutines will end up having affinity to a hardware thread, so having extra concurrency would add more scheduler overhead. I tried setting concurrency double to number of CPUs and it only made like 40ns/op difference. The poor scaling with CPU count of RWMutex is explained here. Here is before the locking change. We observe a 6.8x slowdown with concurrency.
Here is after the locking change. We observe a 4x slowdown with concurrency. We're roughly twice as efficient as the old code with RWMutex.
|
So this issue has not been resolved yet? i have the same issue. i think this issue may cause inaccurate statistics? why not resolve thie issue? |
This package was updating counters concurrently because of incorrect
locking. There's not really a reason to have two separate locks or try
to optimize with RW locks.
This change replaces the two RW locks with one exclusive lock. The
RTMetrics struct methods always acquire this exclusive lock. This is
simple and will be easier to keep correct as the code evolves.