Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server does not shutdown cleanly #21911

Open
2 tasks
nh2 opened this issue Nov 2, 2024 · 0 comments
Open
2 tasks

server does not shutdown cleanly #21911

nh2 opened this issue Nov 2, 2024 · 0 comments

Comments

@nh2
Copy link

nh2 commented Nov 2, 2024

Overview of the Issue

When asked by systemd to stop, consul still exits with status 1, tripping over its own shutdown.


Reproduction Steps

  1. systemctl stop consul.service
systemd[1]: Stopping consul.service...
consul[623754]: 2024-10-28T21:09:51.702Z [INFO]  agent.router.manager: shutting down
consul[623754]: 2024-10-28T21:09:51.702Z [INFO]  agent.router.manager: shutting down
consul[623754]: 2024-10-28T21:09:51.706Z [INFO]  agent: consul server down
consul[623754]: 2024-10-28T21:09:51.706Z [INFO]  agent: shutdown complete
consul[623754]: 2024-10-28T21:09:51.706Z [INFO]  agent.dns: Stopping server: protocol=DNS address=127.0.0.1:8600 network=tcp
consul[623754]: 2024-10-28T21:09:51.706Z [INFO]  agent.dns: Stopping server: protocol=DNS address=127.0.0.1:8600 network=udp
consul[623754]: 2024-10-28T21:09:51.706Z [INFO]  agent: Stopping server: address=127.0.0.1:8500 network=tcp protocol=http
consul[623754]: 2024-10-28T21:09:51.789Z [ERROR] agent.server.raft: failed to decode incoming command: error="transport shutdown"
consul[623754]: 2024-10-28T21:09:51.828Z [ERROR] agent.server.raft: failed to decode incoming command: error="transport shutdown"
consul[623754]: 2024-10-28T21:09:52.707Z [WARN]  agent: Failed to stop server: address=127.0.0.1:8500 network=tcp protocol=http
consul[623754]: 2024-10-28T21:09:52.707Z [INFO]  agent: Waiting for endpoints to shut down
consul[623754]: 2024-10-28T21:09:52.707Z [INFO]  agent: Endpoints down
consul[623754]: 2024-10-28T21:09:52.707Z [INFO]  agent: Exit code: code=1
systemd[1]: consul.service: Main process exited, code=exited, status=1/FAILURE

I find this output very confusing. First it says agent: consul server down and agent: shutdown complete.
So I assume it's shut down.

But then it prints more things to shut down, including Stopping server. Wait, I thought the server was already down as per output above?

Then it notices transport shutdown, and apparently considers ihat an error and exits with exit code 1.

But that doesn't make sense, since it's down because we asked it to shut down.

Environment

Consul v1.18.2 on Linux; config:

{
  "autopilot": {
    "min_quorum": 3
  },
  "bind_addr": "10.0.0.3",
  "bootstrap_expect": 3,
  "check_update_interval": "1ns",
  "data_dir": "/var/lib/consul",
  "dns_config": {
    "allow_stale": false
  },
  "enable_debug": true,
  "enable_script_checks": true,
  "gossip_lan": {
    "probe_interval": "1000ms"
  },
  "performance": {
    "raft_multiplier": 1
  },
  "reconnect_timeout": "8760h",
  "reconnect_timeout_wan": "8760h",
  "retry_interval": "1s",
  "retry_join": [
    "10.0.0.1",
    "10.0.0.2"
  ],
  "server": true,
  "ui_config": {
    "enabled": true
  }
}

Proposed approach

  • Consul should exit with code 0 when asked to shut down, not treating transport shutdown as errors when they are due to the shutdown requested by the user.
  • The logs would ideally make more sense to make clear to the human operator what's already shut down and what isn't.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant