-
Notifications
You must be signed in to change notification settings - Fork 427
Replies: 1 comment · 1 reply
-
To me, the most interesting/relevant thing here is that you're running with |
Beta Was this translation helpful? Give feedback.
All reactions
-
So, the investigation continues to unfold... it's not related to the type of router but rather to how the router is connected (or not) to the internet - I assume something to do with DNS service. So we have:
If it were just a slow start, I wouldn't call it a "bug" per-se but more of a behavior. What makes 4) "buggy" in my opinion is that the connections are accepted by the RabbitMQ server, messages are exchanged for a short time, then the server drops the connection and sometimes re-connect attempts succeed, sometimes they fail for several minutes more. In cases 1) 2) and 3) once a connection is made, it is rock solid - never see a connection dropped by the server side. It is also a ridiculously easy to duplicate test case: basically any router connected to nothing but the RabbitMQ server machine. I have a work-around that is not too bad: disable the ethernet port in the docker.service ExecPre section, something like this (the ExecStartPre lines are added to the standard /lib/systemd/system/docker.service file):
Then, after initial connection is made to the RabbitMQ server, the port is re-enabled - if desired. I assume this prevents the RabbitMQ server from going out on the network and "getting lost" during startup. Obviously, it would be preferable if the server didn't get lost like that without being protected from "empty" network connections. I turned on logging in the 3.7.15 container by adding:
to the Dockerfile, this is an excerpt from a "well behaved" startup log:
Above is using the network port disabling scheme described in docker.service, next - the same excerpt without network disabling and connected to an "empty" network (a bit longer because it extends through some of the failed connections):
The first difference I notice is after the line [info] <0.218.0> Running boot step networking defined by app rabbit First log adds vhost '/' at 13:38:13.601, while the second log does the same step at 14:00:01.578 First log reaches "Running boot step networking" at 13:38:13.812 second log gets there at 14:00:01.723, 211ms vs 145ms. Then, the first log proceeds to start a TCP listener on 5672 after 2ms, while the second log takes 16.007ms before starting its listener. First log has its MQTT listener on 1883 started 2ms later, second log "accepts" three AMQP connections (which ultimately do not result in good queue bindings) over the following 5 seconds and doesn't get its MQTT listener started until 8 seconds after the AMQP listener. First log calls server startup complete at 13:38:14.125, or add vhost + 524ms. Second log calls server startup complete at 14:00:35.941, or add vhost + 34.363 seconds. Much more problematic than the ~35 second delay is the lack of successful connections. While the second log is showing successful connections and authentications, they are slow - on the order of 8 seconds for the first one, whereas the first log is showing the first AMQP authentication within 5ms after connection. Later connections (like 0.1480.0) in the second log are turning around faster, 526ms in that case, but there is still a flood of problems. The clients are configured to retry failed connections after 5 seconds, which works well in scenarios 1) 2) and 3) above, but isn't working in 4). I don't know what is causing this warning:
it doesn't seem related to the problem of slow and unreliable connections. I have tried to install an erlang config file like this one:
by setting an environment variable in the Dockerfile:
but it doesn't seem to be having any effect. Do you know if the environment variables in the Dockerfile take effect early enough to affect the Erlang configuration? If not, do you know how to modify the Docker image to get at the Erlang configuration, maybe using a kernel variable as described in here: https://erlang.org/doc/apps/erts/inet_cfg.html Thanks! |
Beta Was this translation helpful? Give feedback.
All reactions
This discussion was converted from issue #482 on May 19, 2021 00:26.
-
We have deployed rabbitmq:3.7.15-management, hosted on Ubuntu 18.04 on a few hundred devices over the past couple of years with good behavior, until now... Note: tested 3.8.16-management, it displays the same behavior.
When booting while connected to a router that is not connected to anything else the docker container starts much more slowly than normal, leaving our system without its Rabbit server for an unusual and unacceptably long time after startup. The extra delay is just under 2 minutes, and even after initial connection is made the connections are frequently lost where in a situation with no external connection at all, or connection to a router that is connected to the internet the initial connections (queues bound to exchanges) and any later connections are all rock solid.
We are launching using systemd, with this docker.rabbitmq.service file:
The rabbit37mqtt container comes from this Dockerfile:
using this rabbitmq.conf file:
and this build script (which runs one time only):
Any ideas or similar experiences?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions