Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding request timeout #157

Open
wants to merge 37 commits into
base: master
Choose a base branch
from
Open

Adding request timeout #157

wants to merge 37 commits into from

Conversation

lugino-emeritus
Copy link

  • Adding a request timeout in method _send in api.py
    • timeout is 10 + query_params.get("timeout", 30000) / 1000 -> OK? IMHO it shouldn't be constant, since it has to be larger than a possible server side timeout
  • Handle possible exceptions in listen_forever

This should solve #129 and #131

@simonklb
Copy link
Contributor

There should probably be some form of retry limit so that it doesn't hang forever if a server is/goes down.

@lugino-emeritus
Copy link
Author

lugino-emeritus commented Oct 16, 2017

My basic idea was: when using this sdk do design a bot, and the server is offline even for a few hours the client should reconnect again.

So maybe the following is usefull: Option to define a maximal time until listen_forever would be stopped if it is not successful.

But then we have to discuss the following:

  • How should we stop listen_forever, i.e. what exception is useful? Reraise the last exception? Throw a new exception of type MatrixError? Define a new exception?
  • Which default value for this time is useful? 6 hours? Or endless (to be compatible to old versions)?

And also (design):

  • should it be possible to define this time in listen_forever?
  • or use a new variable without a separate method to set this variable (like bad_sync_timeout_limit)?
  • or with a separate method to set this variable?

I would prefer the following:

  • add variable max_retry_time, init it in init with 6 * 60 * 60 = 6 hours
  • raise MatrixError("Stoped listen_forever, since reconnection was not possible for %i seconds") to stop listen_forever

@simonklb
Copy link
Contributor

Although reconnect/retry mechanisms are very useful and user-friendly I don't think this kind of logic belong in the SDK. If you want a bot that automatically reconnects when the server comes back online you should add some kind of heartbeat functionality in the bot and not force it on everyone else using the SDK.

If it's decided to include this in the SDK please make it optional and make sure exceptions still are able to bubble up to the client implementations.

@lugino-emeritus
Copy link
Author

Well, currently listen_forever is running endless, even if the server is not reachable. So I just did things to be compatible with old versions of this sdk.

I did this pull request for a timeout in request, to make sure that requests.request is not running endless. (See #129)

And I added the handling of the new timeout Errors because of the second comment in #149.

Assume you are running listen_forever, and your router is changing the ip (a lot routers do that all 3 days). Now, do you want that the sdk is reconnecting again or not? I only changed things for that situation.
I didn't changed the case when the server is offline, see here.

So adding a max_retry_time (or something like that) would be a new feature, and IMHO that should be handled in a different pull request / issue.

@simonklb
Copy link
Contributor

I don't mind the timeout. I just don't want to have auto-reconnect logic in the SDK. At the least the retry should be optional and have a set limit.

If you only added the request timeout everything would be fine by me. But you also decided to catch ConnectionError and ReadTimeout which is going to prevent implementations from catching those using the exception_handler now.

@lugino-emeritus
Copy link
Author

I agree with you, the exception_handler should handle those exceptions. But then it should also handle the MatrixRequestError (if it is not None). This exception would be thrown if synapse is down, but the server is online. And this is not nice (could be an endless loop). Even more the exception handler should know the time of the last successful connection (or maybe more informations in future).

My suggestion:

  • remove the changes in listen_forever
  • do a new pull request with a new / additional exception_handler (to be compatible to old versions)

@simonklb
Copy link
Contributor

If Synapse is down you shouldn't get any response, a ConnectionError exception should be thrown, so I'm not sure what you mean by that. The MatrixRequestError exceptions are thrown if you get a response that does not have a response code between 200 and 300, i.e. some error in the Matrix server.

Are you running some form of proxy in-front of Synapse?

@lugino-emeritus
Copy link
Author

Yes, I'm running nginx as proxy. And I get a 502 'Bad Gateway' error. Makes sense, of course. However, I implemented a new exception handler. Of course it would be prettier to remove the old exception handler. But then it would not be compatible to old versions. What do you think, should I do a pull request? But first this one should be merged.

@simonklb
Copy link
Contributor

That explains the confusion. The problem here is not that listener isn't catching the exceptions correctly but rather that the API assumes that the HTTP errors are generated by the Matrix server and not something else (the nginx proxy in your case).

@non-Jedi @Half-Shot any ideas?

@non-Jedi
Copy link
Collaborator

Okay. So I haven't read the actual changes yet, so I'm not commenting on that,
yet. Per the above conversation, we should have two kinds of exceptions:

  1. MatrixRequestError for any status codes outside of the acceptable range. We
    shouldn't differentiate between status codes returned by synapse and any
    other random server.
  2. A new MatrixTimeoutError (or some other name) subclassing MatrixError for
    this new situation where a request doesn't return within specified timeout.

listen_forever should probably just have a default of propagating errors
upward with an option of handling the errors and retrying. That would be my
preference. If anyone currently depends on the method just silently swallowing
errors, they can opt into that behavior since it should definitely be opt-in
behavior.

Any other questions I missed?

@non-Jedi
Copy link
Collaborator

@lugino-emeritus could you squash these commits and sign-off per
Contributing.rst, please? Thanks.

Copy link
Collaborator

@non-Jedi non-Jedi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default timeout should be user-tunable for every method of MatrixHttpApi.
Either a new kwarg on every method or a default_timeout kwarg on instantiation
of the object.

I'm also not sure 10 seconds is the correct default value. 30 seconds might make
more sense.

@@ -582,6 +582,8 @@ def _send(self, method, path, content=None, query_params={}, headers={},

if headers["Content-Type"] == "application/json" and content is not None:
content = json.dumps(content)

request_timeout = 10.0 + query_params.get("timeout", 30000) / 1000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason we're forcing this to be a float instead of an int?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. It is just because the requests_timeout could be a float. I will change it.

@@ -590,7 +592,8 @@ def _send(self, method, path, content=None, query_params={}, headers={},
params=query_params,
data=content,
headers=headers,
verify=self.validate_cert
verify=self.validate_cert,
timeout=request_timeout
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever error is raised by requests for a timeout needs to be handled and
reraised as a subclass of MatrixError. (maybe a new error class
MatrixTimeoutError?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also handle the ConnectTimeout next to the ReadTimeout here? With the same exception class? Maybe the ConnectTimeout should be handled with this pull request.

@simonklb
Copy link
Contributor

@non-Jedi I agree with you that making errors propagate is the best course of action. Also, trying to do something fancy to distinguish between actual Matrix server errors and errors spawned by an ingress service or such might be tricky.

But, since HTTP errors are directly translated into Matrix errors rather than kept as request exceptions it might be worth doing something to avoid this kind of confusion. A short mention in the MatrixRequestError doc? Right now it simply says "The home server returned an error response." which obviously wasn't the case here... :)

@non-Jedi
Copy link
Collaborator

@simonklb would absolutely be a good thing, and I would accept a PR for that immediately. :)

lugino-emeritus and others added 4 commits October 23, 2017 18:24
Signed-off-by: Niklas Tittjung <[email protected]>
Signed-off-by: Niklas Tittjung <[email protected]>
Just have to learn some github things... Sorry for changing this file temporary
Copy link
Collaborator

@non-Jedi non-Jedi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than that, it LGTM. Fix those things, squash your commits, and rebase on current master, and we'll see if we can get this one merged before the next release. :)

verify=self.validate_cert,
timeout=request_timeout
)
except requests.exceptions.ReadTimeout:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catch requests.exceptions.Timeout here instead.

@@ -582,16 +582,24 @@ def _send(self, method, path, content=None, query_params={}, headers={},

if headers["Content-Type"] == "application/json" and content is not None:
content = json.dumps(content)

request_timeout = 30 + query_params.get("timeout", 30000) / 1000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a new kwarg default_timeout=25 to __init__, and then do:

request_timeout = 5 + query_params.get("timeout", self.default_timeout * 1000) / 1000

timeout=request_timeout
)
except requests.exceptions.ReadTimeout:
raise MatrixTimeoutError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass the requests exception into the constructor for MatrixTimeoutError
so that people can debug easier. Also consider passing in the full URL that
was called and the method to be inserted into the error's msg (See
MatrixRequestError for an example of this).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants