Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handling #39

Open
mrjoes opened this issue Oct 11, 2012 · 1 comment
Open

Error handling #39

mrjoes opened this issue Oct 11, 2012 · 1 comment

Comments

@mrjoes
Copy link
Member

mrjoes commented Oct 11, 2012

I was doing some internal tests and here's what I found.

  1. There's sockjs_session_sup supervisor which is configured to have up to 10 failures in 10 seconds.
  2. sockjs_session calls callback functions in context of its own process
  3. If there's error in the one of the callback methods, sockjs_session process will die and supervisor will restart it.
  4. If socks_session dies too often (basically 10 times in 10 seconds), the supervisor will be killed with all its children. Effectively, if you can find a way to send a message which triggers the exception, you can kill SockJS server.

I think client processes should be isolated. Even if one client misbehaves, it should not kill whole process tree.

Not sure how to fix it properly, but few ideas:

  1. Increasing limits won't really help - it is temporary measure
  2. Maybe switch to temporary restart strategy? The session will be closed anyway - terminate() is getting called and the session is removed from ETS. Plus, the state is lost, so it can't recover. Or am I missing something?
@SimonWoolf
Copy link

In case it's relevant to anyone else: I've made @mrjoes's suggested change (transient -> temporary) in our fork of sockjs-erlang at https://github.com/ably-forks/sockjs-erlang , after running into the same issue. (Sockjs sessions crashing for legitimate reasons (a service dependency was down) was causing the sockjs application to exit, which doesn't get restarted (even when start_permanent is set), which ensured it continued to fail even once the reason for the crashes fixed itself).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants