You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Have nats-server running is JetStream mode and load in thousands of messages to 2 subjects
In separate client, create a PullSubscriberAsync to the first subject with a NextHandler defined that requests batches of 1000 messages. This subscriber immediately started requesting and receiving messages due to the subject already having data.
Create a second PullSubscriberAsync to the second subject with a NextHandler defined that requests batches of 1000 as well. This call will hang as stated in "Observed behaviour" for _subscribeMulti due to a deadlock with the previous subscription processing messages and fetching more as it tries to create the second subscriber.
My current fix is to add a wait of about 100 milliseconds before doing the second subscriber to let the first subscriber sort itself out.
The text was updated successfully, but these errors were encountered:
Observed behavior
The program hangs because of a deadlock. Seems to be due to locking order.
js_maybeFetchMore lock order:
nats_lockSubAndDispatcher(sub);
to lock the subscription_sendPullRequest
which callsnatsConnection_PublishRequest
which then tries to lock the connection's mutex withnatsConn_Lock(nc);
which is held by_subscribeMulti
natsConn_processMsg lock order:
natsMutex_Lock(nc->subsMu);
to lock to connection's sub mutex:subMu
nats_lockRetainSubAndDispatcher(sub);
which is held byjs_maybeFetchMore
_subscribeMulti lock order:
natsConn_subscribeImpl
withlock == true
so it locks the connection's mutex withnatsConn_Lock(nc);
natsMutex_Lock(nc->subsMu);
which is held bynatsConn_processMsg
Expected behavior
That it wouldn't deadlock in this scenario.
Server and client version
Nats-sever: 2.10.20
Nats.c: 3.9.1
Nats: 0.1.2 (Not relevant for this)
Host environment
OS: Rocky Linux 8.4
Arch: amd64
GCC: 13.2.0
Linked libraries:
linux-vdso.so.1 (0x00007ffd9effe000)
libcurl.so.4 => /lib64/libcurl.so.4 (0x00007f42cf147000)
libz.so.1 => /lib64/libz.so.1 (0x00007f42cef2f000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f42ced2b000)
libprotobuf-c.so.1 => /lib64/libprotobuf-c.so.1 (0x00007f42ceb22000)
libcrypto.so.1.1 => /lib64/libcrypto.so.1.1 (0x00007f42ce637000)
libssl.so.1.1 => /lib64/libssl.so.1.1 (0x00007f42ce3a3000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f42ce183000)
libstdc++.so.6 => /work/shared/devtools/gccbin/gcc_13_2_0/lib64/libstdc++.so.6 (0x00007f42cdd21000)
libm.so.6 => /lib64/libm.so.6 (0x00007f42cd99f000)
libgcc_s.so.1 => /work/shared/devtools/gccbin/gcc_13_2_0/lib64/libgcc_s.so.1 (0x00007f42cd77b000)
libc.so.6 => /lib64/libc.so.6 (0x00007f42cd3b6000)
/lib64/ld-linux-x86-64.so.2 (0x00007f42cf3cc000)
libnghttp2.so.14 => /lib64/libnghttp2.so.14 (0x00007f42cd18f000)
libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f42ccf3a000)
libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f42ccc4f000)
libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f42cca38000)
libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f42cc834000)
libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f42cc623000)
libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f42cc41f000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f42cc207000)
libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f42cbfdc000)
libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007f42cbd58000)
Steps to reproduce
This hard to reproduce, general steps:
_subscribeMulti
due to a deadlock with the previous subscription processing messages and fetching more as it tries to create the second subscriber.My current fix is to add a wait of about 100 milliseconds before doing the second subscriber to let the first subscriber sort itself out.
The text was updated successfully, but these errors were encountered: