Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine socket on container becomes unusable after Engine crash #282

Open
lmbarros opened this issue Dec 16, 2021 · 7 comments
Open

Engine socket on container becomes unusable after Engine crash #282

lmbarros opened this issue Dec 16, 2021 · 7 comments

Comments

@lmbarros
Copy link
Contributor

lmbarros commented Dec 16, 2021

If we start a container with the label io.balena.features.balena-socket: '1' set, this container will have access to the Engine socket. However, if the Engine crashes on the Host OS, that container will no longer be able to connect to the Engine (even after the Engine restarts on the HostOS). Attempting to run Docker on the container will fail with

Cannot connect to the Docker daemon at unix:///host/run/balena-engine.sock. Is the docker daemon running?

This can be easily reproduced by SIGKILLing balenad on the Host OS and then trying to run Docker or balenaEngine on a container where it was previously working.

This is arguably on the border between the Supervisor (that sets the mounts and shares up) and the Engine (that implements the mechanisms).

@jellyfish-bot
Copy link

[lmbarros] This issue has attached support thread https://jel.ly.fish/41b56e32-5fae-4a2e-b5bb-05f9f5af1f0f

@lmbarros lmbarros transferred this issue from balena-os/balena-supervisor Dec 16, 2021
@deanMike
Copy link

I have an example of this issue here: https://github.com/machinemetrics/docker-socket

@cywang117
Copy link

@lmbarros
Copy link
Contributor Author

Did a couple more quick tests:

  • SIGKILL leaves the socket unusable in the container, as we already knew.
  • SIGABRT gives the same result as above. (This case might be of interest because that's what the watchdog sends on a timeout)
  • SIGTERM is fine, however: after the Engine restarts in the host, the socket becomes usable again in the container.

@klutchell
Copy link
Contributor

klutchell commented Feb 4, 2022

I suspect this would be resolved by balena-os/balena-supervisor#1780

@deanMike
Copy link

I suspect this would be resolved by balena-os/balena-supervisor#1780

@klutchell Do you know if there's still a plan to get that fix in? If there's any way me and my team could help test this out this issue has been a real thorn in our side

@klutchell
Copy link
Contributor

Hey @deanMike, I have requested updates on the linked PR: balena-os/balena-supervisor#1780

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants