Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Runtime error when trying to save notification URL #216

Open
Stitch10925 opened this issue Oct 22, 2024 · 12 comments
Open

[BUG] Runtime error when trying to save notification URL #216

Stitch10925 opened this issue Oct 22, 2024 · 12 comments
Labels
troubleshooting Maybe bug, maybe not

Comments

@Stitch10925
Copy link

Hi,

Every time I try to add a notification URL the Beszel container crashes with the following error:

panic: runtime error: invalid memory address or nil pointer dereference
 [signal SIGSEGV: segmentation violation code=0x1 addr=0x70 pc=0xf53789]
 
 goroutine 103 [running]:
 github.com/pocketbase/pocketbase/models.(*Record).Set(0xc00016d880, {0x162bdbc, 0x6}, {0x14020c0, 0xc0007a8e20})
 	/go/pkg/mod/github.com/pocketbase/[email protected]/models/record.go:305 +0x209
 beszel/internal/hub.(*Hub).updateSystem(0xc00010ae10, 0xc000603960)
 	/app/internal/hub/hub.go:310 +0x83b
 created by beszel/internal/hub.(*Hub).updateSystems in goroutine 13
 	/app/internal/hub/hub.go:252 +0x170

The URL I'm trying to save has the following format: ntfy://user:passw@hostname/topic

Also, sometimes when I refresh the browser it doesn't show me my systems anymore, but I guess that is another problem since no errors are logged for that.


PS: THANK YOU (!) for this project. I have been looking for a long time for a simple monitoring tool with centralized management for alerts. This tool is just perfect.

@henrygd
Copy link
Owner

henrygd commented Oct 22, 2024

No worries.

Can you please go to the Export Collections page (/_/#/settings/export-collections), copy your collections, and paste them here?

Or download the JSON file and attach it.

@henrygd henrygd added the troubleshooting Maybe bug, maybe not label Oct 22, 2024
@Stitch10925
Copy link
Author

Here is the export:

collections_export.json

@henrygd
Copy link
Owner

henrygd commented Oct 23, 2024

Are you sure this is triggered when trying to save notification settings, or could the timing be coincidence?

The trace is not directly related to that. I was able to replicate the error by changing the name of the container_stats collection, which is why I wanted to double check your collection schema. But that looks fine.

This is also the first I've heard of systems sometimes not showing up.

Regardless, I'll refactor that section of code for the next release to prevent the panic from happening.

Can you let me know if you see any errors in the browser devtools console besides "ClientResponseError 0: The request was autocancelled"? That one is expected and harmless.

henrygd added a commit that referenced this issue Oct 23, 2024
* adds error handling for collection lookup (#216)
@Stitch10925
Copy link
Author

While I was trying to get the collection export for you I had a lot of crashes in Beszel. I also noticed that other services I had running on Docker were unstable. Tonight I had another look at it. I shut down some of the services and everything became stable again.

I assume that my NAS (which hosts the NFS volumes for my Docker services) isn't able to provide fast enough I/O for the Docker services. This is probably also why Beszel is behaving a bit odd. I am going to replace some of the disks in my NAS with SSDs to see if that fixes the issue.

@henrygd
Copy link
Owner

henrygd commented Oct 23, 2024

Sounds good.

The specific error you ran into should be impossible now in 0.6.2, but let me know if any other weirdness continues.

@Stitch10925
Copy link
Author

I added the SSDs, but the problem still remains. Sometime when I load Beszel no systems appear and I see the following in the browser's error logs:

image

Then after a while the systems finally show up. Not sure if it is a problem with collecting info from the agents or not.

I also see the following errors:

image

Not sure if they're relevant or not.

@Stitch10925
Copy link
Author

Just saw this in the logs of the agents:

image

Not sure if that is causing any delays or issues towards the app.

@henrygd
Copy link
Owner

henrygd commented Oct 25, 2024

Merging your other issue #221 here. I think these are all connected to an underlying issue on your system, probably with Docker.

In regard to "Something went wrong while processing your request," please go to the logs page /_/#/logs and search for "error" -- you may find more information about what's going wrong.

For the concurrent map write - the most likely cause is the agent getting stuck somehow in the Docker related code for so long that the hub re-requests the stats while the previous call is still running. I'll try to add a check for this, but it seems like a symptom of a larger issue.

Here are some things that would be helpful to know:

  1. Were the issues introduced after upgrading to a certain version, or have you always experienced this?
  2. Are you running the agent and hub on the same system?
  3. If you have agents on different systems, do they all display similar issues?
  4. What is the OS and architecture of your system(s)?
  5. If you run docker stats on the agent system, does the information look correct with no zero values?

Please put the agent in debug log level and let it run for a few minutes so it fields some requests from the hub. Attach or paste the output here.

Thanks

@Stitch10925
Copy link
Author

For context:

I have been running Beszel on Docker Swarm. I have replicated the agent to all nodes in the Swarm and was running the App on one of the nodes. The volumes are being served by NFS, which SQLite is not very happy about, but I haven't had too many issues with it in the past.

It seems however, that PocketBase and/or Beszel seem to be very sensitive to I/O speed. I have changed to volume of Beszel to use the node's filesystem instead of my NFS share and it is much more stable now. But, of course, I lose the advantages of the Swarm doing it this way.

@henrygd
Copy link
Owner

henrygd commented Oct 26, 2024

Gotcha. This definitely sounds like a compatibility or configuration issue with swarm. I don't use it myself so I haven't done any testing on it.

There's related issue you can check out here: #17

From your logs screenshot it looks like your agents may be handling two simultaneous calls from the hub. This would explain your concurrent map write error. Not sure if this is because there are two instances of the hub, or swarm is just grabbing the first node to respond, like the issue above.

I do want to add an option for agent -> hub data flow at some point which should fix this. But for now you probably need to use the same workaround that's explained in the linked issue.

@saket1999
Copy link

Hi, I am also getting this similar issue, not sure what is triggerring it, I was just going though the data after updating the beszel to 0.8.0. I am attaching both export and logs.
beszel.log
pb_schema.json

@henrygd
Copy link
Owner

henrygd commented Nov 15, 2024

@saket1999 Thanks, I'll fix that in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
troubleshooting Maybe bug, maybe not
Projects
None yet
Development

No branches or pull requests

3 participants