-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: potential deadlock with goroutine stuck on internal lock? #15
Comments
Hi. |
I found one bug in the library, hopefully fixed it in 1595213 |
I just implemented a deadlock detector with easy to prove correctness: |
I'm facing the same problem while debugging Mercure:
I'm using the same version of the library. Thanks for writing such an useful tool btw! |
v0.2.1 (1595213) This blocking occurs in the caching layer, which is heavily loaded, about 100,000 requests per second.
|
But in this case, there is a very small timeout, while in my real code there is 5 minutes. package main
import (
"time"
"github.com/sasha-s/go-deadlock"
)
func main() {
deadlock.Opts.DeadlockTimeout = 1 * time.Second
var mx deadlock.Mutex
fn := func() {
mx.Lock()
mx.Unlock()
}
for i := 0; i < 100_000; i++ {
go func() {
for {
fn()
}
}()
}
select {}
} Perhaps the problem is not in this library, but in some other place. But in my real case, mutex protected counting the number in a slice and adding to a slice. // copy code from https://github.com/jaegertracing/jaeger/blob/748bf213d1e5a03ebc4da63601473be3bbe193f4/examples/hotrod/pkg/tracing/mutex.go with my fixes
type Mutex struct {
...
waiters []string
waitersLock deadlock.Mutex // <--
}
func (sm *Mutex) Lock(ctx context.Context, name string) {
activeSpan := opentracing.SpanFromContext(ctx) // "activeSpan == noopSpan{}" in production
sm.waitersLock.Lock() // <-- "Previous place where the lock was grabbed"
if waiting := len(sm.waiters); waiting > 0 && activeSpan != nil {
activeSpan.LogFields(...) // <-- func (n noopSpan) LogFields(fields ...log.Field) {}
}
sm.waiters = append(sm.waiters, name)
sm.waitersLock.Unlock() // <-- "Have been trying to lock it again for more than 5m0s"
sm.realLock.Lock()
sm.holder = name
sm.waitersLock.Lock()
behindLen := len(sm.waiters) - 1
sm.waitersLock.Unlock()
if activeSpan != nil {
activeSpan.LogFields(...)
}
} |
Hey, first of all thanks for the hard work on this great lib!
I'm having trouble interpreting the output below. It suggests
goroutine 77264
holdslock 0xc4202a60e0
for a long time, preventing others (likegoroutine 77325
and many more) from acquiring it.However, the output suggests that
goroutine 77264
actually got stuck during unlock:raft.go:688
is a deferredmu.Unlock()
, anddeadlock.go:330
is actually is a lock acquire statement in this lib.Does this mean that the (potential) deadlock is coming from this lib in this case? What would make
goroutine 77264
get stuck on that internal lock? (I reproduced the same output with1m30s
lock timeout.)The text was updated successfully, but these errors were encountered: