-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Randomly getting stuck at 100% CPU #754
Comments
Could be a similar issue to this: TeamHG-Memex/aquarium#1 |
Hey! In production we're doing health checks (checking if /_ping endpoint is responsive) and restarting containers if they don't respond; this is done via Mesos, not docker-compose; unfortunately this setup is not open source. Aquarium does something similar to that, but it looks like in some cases it is less reliable. A lot of problems were solved in Splash 3.2 release where HTML5 video and audio is disabled by default; this change decreased crash/restart count up to 10x for us. What Splash version are you using? |
3.1 |
Issue is still happening, if not worse. Decided to restart the docker container only once a day instead of once an hour to give it a try. |
I've had Splash containers do that a few times (maybe once a month) where the only solution was to restart the host. That particular issue is likely a Docker issue. Make sure you're running the latest Docker. My current plan regarding the getting-stuck issue is to test the /_ping endpoint as soon as I have a container go rogue. Usually I have one or two go rogue daily (I'm running around 20 currently), but it's been a couple of days and they have all been fine. Go figure. Anyway, if the /_ping endpoint in fact shows the container is unresponsive, my plan is to write a simple script that checks the endpoint for each container on the host, and then restarts containers as necessary. I'll run it every 15 minutes via cron. |
FYI, I encounter this on a regular basis when running splash inside docker. Every time I've debugged it, the sockets that splash was using to communicate with the hosts in question are in |
met same issue! |
We seem to be running in to the same issue, running Splash 3.3.1 in a Docker Stack with 5 instances on Docker Swarm. |
3.5 same issue |
Hello,
I'm facing an issue with a docker deployed splash, that the python3 process will randomly get stuck at 100% CPU and any request coming afterwards will eventually timeout (using a 70s timeout right now).
Now this is an issue that only happens randomly, in a process pipe that will process about a hundred pages per hour.
I can't really tell if this is related to very specific pages causing a processing bottleneck or some bad settings.
The call looks like: GET /render.jpg with params:
I don't really care if a bad page errors out or not. but it's critical that it doesn't make everything else to get stuck, is there anything that can be done to either improve this or debug the issue?
Thanks.
The text was updated successfully, but these errors were encountered: