-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate and document timeouts across components #525
Comments
adding notes as i go...
|
as of now, my understanding is that we have two main issues, the rest is just cleanup/refactoring:
@rmol is speeding up the wsgi process by using a key cache (getting source keys takes a long time) and word is he will be adding usage of nginx, see freedomofpress/securedrop#5184, but we will still need to confirm whether or not this fixes the frequent timeout issue |
To be annoyingly pedantic, once the connection is established, if the server were ever to go 120 seconds without sending anything, the proxy would time out the request. For our purposes, it's almost always the same thing, as we're generally shipping entire responses and not trickling data. And right now we'll hit the 60-second Apache
I'm not seeing this with my staging environment. With just the fingerprint caching, responses take less than 30 seconds. With the key caching added in freedomofpress/securedrop#5184 they're under 20 seconds. Could be differences in our hardware or VM performance. (And yes, this is still pretty ridiculous for the amount of work happening and the size of the responses. A bunch of this is Tor, which can be helped by compressing the responses, but we're still taking several seconds to produce a relatively small amount of JSON, and I think it could still use more scrutiny.) Just for the benefit of those following along at home, I don't think we're planning on nginx any time soon? Certainly not as part of freedomofpress/securedrop#5184. There is an issue for that (freedomofpress/securedrop#2414) but I haven't heard anything about it recently. |
Totally, that's my understanding too, which is why i mentioned how this should probably be updated to 60 seconds to match the apache config. However, my statement about the proxy waiting 120 seconds is true about the way it's configured as of now. If the apache config timeout was updated to say 180 seconds, the proxy would still timeout after 120 seconds, since that is what is specified.
I will test again. It could be a hardware issue, but it sounds like the 1000 source timeout issue has been fixed for you ever since you updated the proxy timeout to 120 seconds, which allowed us to wait until 60 seconds for a connection? Although, it seems like the 40 second sdk timeout at the subprocess layer would timeout before we would hit that 60 second timeout, correct? Something is not lining up here. I'm going to run some new tests and will report back. |
I just tested the latest client on Qubes against my staging server with 1000 sources and still see no sources populated in the source list after 10 minutes due to freedomofpress/securedrop-client#1025 (comment) I'm concerned we are still continuing to see timeouts as much as one third of the time when there are 200 sources, see freedomofpress/securedrop-client#1007 (comment), so I will test a server with 200 sources next. |
Nope, this is not in the near or medium term - to expand a bit, we don't have a good way of performing the migration that doesn't add a lot of burden for administrators. The bionic upgrade might be the next window to do it.
👍 Backing up for a second, before we did the final release before the first pilot provisioning, I tested I just added another 500 sources (1000 total) on the server, deleted my client database, and ran with master (
Sync does continue to show intermittent failures - they are intermittent. Otherwise it sounds like everyone is using latest I think next steps are: we compare directly the response time of our |
For the 4/22-5/6 sprint, @creviera has agreed to clean up these notes a bit and copy them to the wiki, with review/input from @rmol. Then we can close out this issue. |
Wiki updates can be found here: https://github.com/freedomofpress/securedrop-workstation/wiki/Timeouts |
Thank you for the write-up Allie! :) @rmol will take a spin sometime during this sprint (5/6-5/20), then we can close out this issue. |
Wiki summary looks good. Closing. |
The client performs many network operations with associated timeouts. If timeouts are too short, operations may fail; if they are too long, user feedback may be delayed. Timeouts are negotiated at different levels of the stack, e.g.:
When shorter timeouts at one level override longer ones at another, this can lead to unexpected results, as we saw during investigation of freedomofpress/securedrop-client#1007. There also different types of timeouts (e.g.,
ConnectTimeout
vs.ReadTimeout
) which may need to be set to different values.We should more document the different timeouts, and the overall connection architecture, to ensure all developers can consistently reason about the expected behavior of the whole application. This can be done in the workstation wiki for now.
The text was updated successfully, but these errors were encountered: