Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uncatched exception "Connection refused" #28

Open
laurentguilbert opened this issue Jan 13, 2014 · 11 comments
Open

Uncatched exception "Connection refused" #28

laurentguilbert opened this issue Jan 13, 2014 · 11 comments

Comments

@laurentguilbert
Copy link

Hello,

First of all thanks for the great work, JobTastic is really neat and handy.

I tried running delay_or_eager and delay_or_fail with a stopped broker (rabbitMQ) but it seems that the resulting error isn't catched.

What I get is a 500 "[Errno 111] Connection refused".

I'm working on a very simple test case with the bare minimum. My setup is Celery (3.0) along with JobTastic (0.2.2) and amqp (1.0.13).

I figure this shouldn't be a hard fix so let me know if you are aware of this issue or if I'm doing something wrong here. I'll be glad to contribute.

@winhamwr
Copy link
Contributor

Hello Laurent,

Thanks for creating an issue. For handling connection errors, we're using _get_possible_broker_errors_tuple to try and piggyback on Celery's own error detection code. I'm surprised that Celery's connection_errors doesn't contain that error, hrm.

This is definitely something I'd like to have fixed, so contributions (or any insight in to the problem) would bery very-much appreciated. The goal of delay_or_fail is definitely to protect you from those kind of exceptions.

Thanks
-Wes

@laurentguilbert
Copy link
Author

Thanks for your quick answer.

I'll try to fix it and send you a pull request as soon as I can.

@laurentguilbert
Copy link
Author

After a while I figured out what's happening.

Inside task.run and task.on_success the state of the task is updated whether it's eager or not.

So if your result backend is not running delay_or_eager will fail and never run locally.

I figure a simple fix would be to check against task_is_eager inside task.run and task.on_success (this one seems to be deprecated along with delay_or_run).

Let me know what you think and if you agree I'll submit a pull request !

@winhamwr
Copy link
Contributor

Inside task.run and task.on_success the state of the task is updated whether it's eager or not.

Ooooh. That makes sense, and I hadn't fully considered the case of a broken result backend. delay_or_fail and delay_or_eager are designed to work around broken task brokers. Hrm.

So one way of handling this would be to document that even if you use delay_or_FOO, if your result backend is down and your task tries to record progress or record a result, you're still going to get an Exception. I think that's very reasonable. Any task that actually needs to return a result can't really be count as having run if we don't have a way of storing/retrieving that result, so there's no way to gracefully fail.

One necessary change for delay_or_eager runs where the result backend is down: we shouldn't throw an exception in the on_success handler. We should catch the exception ourselves, log a warning message and just keep going.

And also in delay_or_fail, the simulate_async_error call should catch that backend exception, log a warning message and keep going.

For both of those cases, we need to find something equivalent to _get_possible_broker_errors_tuple, but that finds result backend errors. Hopefully Celery internals have something like this?

Does that make sense to you?

Thanks
-Wes

@winhamwr
Copy link
Contributor

Oops, and you're also totally right about this code in task.run also needing to recover from a broken result backend.

@laurentguilbert
Copy link
Author

Well it seems that _get_possible_broker_errors_tuple catches all the exceptions I encountered for a broken result backend. In fact I think IOError should cover it.

Is it really necessary to record an eager task's state ? By design celery's task.apply requires no connection at all.

delay_or_fail will throw an exception at mark_as_failure though. This seems alright since you can't expect a coherent result here without a working backend. An alternative is to return a failed EagerResult with the backend exception, not sure about this.

@winhamwr
Copy link
Contributor

Hi Laurent,

By design celery's task.apply requires no connection at all.

I think that was a good design choice in Celery itself, but for the kind of user-facing tasks that jobtastic deals with (where we care about progress and almost always also about the result), it adds complexity. A JobtasticTask goes to great lengths to make a task behave the same, regardless of the way it's run, which means storing the results in the backend. That allows your other code (probably something in a rendered template that might do AJAX requests) to not care about whether we were in the delay case, the eager case or the fail case.

I think that the biggest problem you've identified is that someone using Jobtastic just for caching, thundering heard avoidance and the fault-tolerance helpers might have a task whose results they don't care about at all. In their case, breaking the task on our internal update_progress call hurts them when they could otherwise have gone on along their merry way with a broken result backend. I think we should recover from a result backend error around that call, automatically.

delay_or_fail will throw an exception at mark_as_failure though.

I think that's another place where we should catch the exception, log a warning and keep going. delay_or_fail is great for something like a report I'm running for a user. If there's something wrong with my broker or backend, I don't want to run that report in my web process. It's better to just give the user an error message and have them try again later. Right now, if it's the result backend that's broken, we don't meet that goal. There will be an exception thrown in the web process.

The other necessary fix is:

One necessary change for delay_or_eager runs where the result backend is down: we shouldn't throw an exception in the on_success handler. We should catch the exception ourselves, log a warning message and just keep going.

Does that make sense to you? Basically, I think we should try to recover wherever it makes sense.

Thanks
-Wes

@laurentguilbert
Copy link
Author

Yes, makes perfect sense.

@edwardotis
Copy link

Hi all,

Any updates on this issue? I ran into the delay_or_eager bug recently with Celery 3.1

Thanks,

Ed

@winhamwr
Copy link
Contributor

Hello Ed,

No updates so far. Waiting for either a pull request or some magical time fairy to let me write it :)

I still haven't upgraded to Celery 3.1 on my major project, so until that happens, I'm unlikely to be the one to write the pull request.

Thanks
-Wes

@edwardotis
Copy link

Gotcha, thanks for the update, Wes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants