You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey everyone, I'm in quite a problematic situation right now, having lost connection to around 20 IOT devices.
We have had NetBird running for quite a while on v0.26.0. A couple of days ago, we decided to upgrade to the latest version. We took note that the notes said that the jsonfile storage mechanism would be automatically replaced with sqlite storage from v0.28.0 onwards. So heres how we did the upgrade:
Clone NetBird v0.27.0 repo, copy artifacts directory and run the configuration script. Run the new docker compose file.
While v.027.10 is running, use the migration CLI to migrate the JSON file to the new SQLite file.
Clone the v0.35.2, copy artifacts directory and run the configuration script again, making sure the management.json file storage driver is set to sqlite.
After doing these steps, we noticed existing clients were all offline but slowly, over the course of several hours, 33/71 clients came back online.
We somehow managed to get to 50/71 clients back online and connecting to our self-hosted NetBird instance after trying many different things such as:
Downgrading back to v0.27.10
Using the old JSON file instead of the SQLite database
Disabling authentication on the coturn server so we didn't have auth failures there
Trying various Turns[].Password values in the management.json file
And a lot more we did in a panic that I can't remember.
However, the remaining devices don't seem to want to come back. The management server logs show occasional lines like this:
WARN [accountID: UNKNOWN, peerID: <<REDACTED>>, context: GRPC, requestID: a1b1b45a-01af-4a66-a196-58dcf7c1cde7] management/server/grpcserver.go:471: failed logging in peer <<REDACTED>: no peer auth method provided, please use a setup key or interactive SSO login
I so happened to have 1 test IOT device on hand which is also unable to connect (the other devices are all over the country). Looking at the NetBird daemon logs on that device, I see this:
systemd[1]: Started netbird.service - A WireGuard-based mesh network that connects your devices into a single private network..
netbird[1021]: 2025-01-12T16:40:13+13:00 INFO client/cmd/service_controller.go:24: starting Netbird service
netbird[1021]: 2025-01-12T16:40:13+13:00 INFO client/cmd/service_controller.go:64: started daemon server: /var/run/netbird.sock
netbird[1021]: 2025-01-12T16:40:13+13:00 INFO client/internal/connect.go:119: starting NetBird client version 0.28.9 on linux/arm64
netbird[1021]: 2025-01-12T16:40:14+13:00 ERRO management/client/grpc.go:350: failed to login to Management Service: rpc error: code = PermissionDenied desc = no peer auth method provided, please use a setup key or interactive SSO login
Restarting the daemon only produces the same result.
Is there any workaround or solution for us to get the remaining devices connected again? It seems like if there was someway to temporarily bypass auth, so those devices could authenticate successfully and reconnect, things would be solved.
Any suggestions and ideas are much appreciated!
PS: I'm certain this whole problem is "user error" and not an issue with NetBird itself, hopefully it's possible to have safeguards to ensure issues like this don't happen for others.
The text was updated successfully, but these errors were encountered:
Hey everyone, I'm in quite a problematic situation right now, having lost connection to around 20 IOT devices.
We have had NetBird running for quite a while on v0.26.0. A couple of days ago, we decided to upgrade to the latest version. We took note that the notes said that the
jsonfile
storage mechanism would be automatically replaced withsqlite
storage from v0.28.0 onwards. So heres how we did the upgrade:management.json
file storage driver is set tosqlite
.After doing these steps, we noticed existing clients were all offline but slowly, over the course of several hours, 33/71 clients came back online.
We somehow managed to get to 50/71 clients back online and connecting to our self-hosted NetBird instance after trying many different things such as:
Turns[].Password
values in themanagement.json
fileHowever, the remaining devices don't seem to want to come back. The management server logs show occasional lines like this:
I so happened to have 1 test IOT device on hand which is also unable to connect (the other devices are all over the country). Looking at the NetBird daemon logs on that device, I see this:
Restarting the daemon only produces the same result.
Is there any workaround or solution for us to get the remaining devices connected again? It seems like if there was someway to temporarily bypass auth, so those devices could authenticate successfully and reconnect, things would be solved.
Any suggestions and ideas are much appreciated!
PS: I'm certain this whole problem is "user error" and not an issue with NetBird itself, hopefully it's possible to have safeguards to ensure issues like this don't happen for others.
The text was updated successfully, but these errors were encountered: