Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RKE2 tries to start containerd but it may already be started, causing rke2 to fail to start #6853

Closed
bendemott opened this issue Sep 21, 2024 · 3 comments

Comments

@bendemott
Copy link

Environmental Info:
RKE2 Version: 1.27 -> 1.31 (tested)

Node(s) CPU architecture, OS, and Version:

  • Windows Server 2019 Standard Edition
  • Intel x86/64 Architecture

Cluster Configuration:

  • Single windows server, 3x linux servers.

Describe the bug:
Some installations of containerd will register a service (services.msc)
If containerd is started before rke2 as a service, rke2 fails to start with the error:

time="2024-09-13T16:58:17-07:00" level=error msg="containerd exited: exit status 1"

The cause of the error can be found in containerd log at: C:\var\lib\rancher\rke2\agent\containerd\containerd.log

containerd: failed to get listener for main ttrpc endpoint: open //./pipe/containerd-containerd.ttrpc: Access is denied.
  • //./pipe/containerd-containerd.ttrpc: Access is denied. is the shared pipe containerd opens for communication between
    the daemons.
  • The Access Denied error is because containerd is already running, and that named pipe is already in use.

Disabling the containerd service will prevent this conflict from happening.

Steps To Reproduce:

  • Install containerd
  • Install rke2
  • Register containerd as a service containerd.exe --register-service
  • Start the containerd service via services.msc
  • Attempt to start/restart rke2 service.

Expected behavior:
It can be difficult in windows to detect if a named pipe is in use.

  • dir //./ will list named pipes...
  • perhaps a warning / error in rke2 if containerd is already running ?

using ctr.exe will timeout if containerd is NOT running.

PS C:\Users\Administrator> ctr --connect-timeout=5s images ls
ctr: failed to dial "\\\\.\\pipe\\containerd-containerd": context deadline exceeded: connection error: desc = "transport: error while dialing: dial \\\\.\\pipe\\containerd-containerd: timeout"

Actual behavior:
rke2 fails to start with unhelpful message

Additional context / logs:
happy to add full logs if this description is not enough.

Workaround

unregister the containerd service, or disable it

    containerd.exe --unregister-service
@bendemott bendemott changed the title Windows: Rke2 tries to start containerd but it may already be started, causing rke2 to fail to start RKE2 tries to start containerd but it may already be started, causing rke2 to fail to start Sep 21, 2024
@brandond
Copy link
Member

Don't do that. Don't run rke2 alongside an existing installation of containerd, or if you must, point rke2 at that existing service's socket via the --container-runtime-endpoint option so that it does not try to start the bundled containerd.

@bendemott
Copy link
Author

bendemott commented Sep 23, 2024

@brandond

There are many reasons RKE2 may fail to start containerd
In any case, a better message than the following would be useful: (a hint to the log file location of containerd)

time="2024-09-13T16:58:17-07:00" level=error msg="containerd exited: exit status 1"

Also, my error was not caused by having a "separate" installation of containerd from RKE2.

It's that I unknowing called containerd.exe --register-service - this registered the RKE2 containerd as a service.

@brandond
Copy link
Member

It's that I unknowing called containerd.exe --register-service

Why did you do that? You are not intended to manually run RKE2's bundled containerd at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants