Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomly getting stuck at 100% CPU #754

Open
cristianocca opened this issue Mar 20, 2018 · 9 comments
Open

Randomly getting stuck at 100% CPU #754

cristianocca opened this issue Mar 20, 2018 · 9 comments

Comments

@cristianocca
Copy link

Hello,

I'm facing an issue with a docker deployed splash, that the python3 process will randomly get stuck at 100% CPU and any request coming afterwards will eventually timeout (using a 70s timeout right now).

Now this is an issue that only happens randomly, in a process pipe that will process about a hundred pages per hour.

I can't really tell if this is related to very specific pages causing a processing bottleneck or some bad settings.

The call looks like: GET /render.jpg with params:

  • render_all: 0
  • width: 1024
  • quality: 50
  • timeout: 60 (the actual http request is set to timeout to 70)
  • wait: 3

I don't really care if a bad page errors out or not. but it's critical that it doesn't make everything else to get stuck, is there anything that can be done to either improve this or debug the issue?

Thanks.

@landoncope
Copy link

Could be a similar issue to this: TeamHG-Memex/aquarium#1

@kmike
Copy link
Member

kmike commented Mar 21, 2018

Hey! In production we're doing health checks (checking if /_ping endpoint is responsive) and restarting containers if they don't respond; this is done via Mesos, not docker-compose; unfortunately this setup is not open source. Aquarium does something similar to that, but it looks like in some cases it is less reliable. A lot of problems were solved in Splash 3.2 release where HTML5 video and audio is disabled by default; this change decreased crash/restart count up to 10x for us. What Splash version are you using?

@cristianocca
Copy link
Author

3.1
Will give 3.2 a try

@cristianocca
Copy link
Author

Issue is still happening, if not worse. Decided to restart the docker container only once a day instead of once an hour to give it a try.
Kind of worked fine for a day, until it got stuck as usual. However, this time a docker restart command wasn't even enough to "unstuck" it. Had to completely restart the machine.
Docker not being able to restart it might have been just something random, but will keep an eye now.

@landoncope
Copy link

landoncope commented Mar 24, 2018

I've had Splash containers do that a few times (maybe once a month) where the only solution was to restart the host. That particular issue is likely a Docker issue. Make sure you're running the latest Docker.

My current plan regarding the getting-stuck issue is to test the /_ping endpoint as soon as I have a container go rogue. Usually I have one or two go rogue daily (I'm running around 20 currently), but it's been a couple of days and they have all been fine. Go figure.

Anyway, if the /_ping endpoint in fact shows the container is unresponsive, my plan is to write a simple script that checks the endpoint for each container on the host, and then restarts containers as necessary. I'll run it every 15 minutes via cron.

@jhart-r7
Copy link

jhart-r7 commented Apr 4, 2018

FYI, I encounter this on a regular basis when running splash inside docker. Every time I've debugged it, the sockets that splash was using to communicate with the hosts in question are in CLOSE_WAIT and stay that way. Splash 3.2.

@sloth2012
Copy link

met same issue!

@exic
Copy link
Contributor

exic commented Sep 3, 2019

We seem to be running in to the same issue, running Splash 3.3.1 in a Docker Stack with 5 instances on Docker Swarm.

@davidkong0987
Copy link

3.5 same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants