Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix infinite loop caused by edge case during terminus_job_poll() #188

Merged
merged 2 commits into from
Nov 19, 2014

Conversation

iamEAP
Copy link
Contributor

@iamEAP iamEAP commented Oct 30, 2014

Problem / motivation

In cases where terminus_request() errors (e.g. a timeout), it will return FALSE and close the curl handle / resource. Normally, this is fine and doesn't matter... But when terminus_request() is called by terminus_job_poll(), it's possible to get stuck in an infinite loop.

Underlying cause is that the curl handle is statically cached in terminus_request... Although there's a check to re-initialize it, the check isn't explicit enough (specifically, $ch will be an integer when closed, which is truthy).

Proposed resolution

Explicitly check that the curl handle is a resource. Also, increase the timeout to match the server-side timeout (which appears to be 45 seconds).

@iamEAP
Copy link
Contributor Author

iamEAP commented Oct 30, 2014

Adding a bit more detail, here... This is the infinite loop of errors that show up:

Cloning database from test.................CONNECTION ERROR: Operation timed out after 30001 milliseconds with 0[error]
bytes received
No response for site jobs!                                           [error]
Invalid argument supplied for foreach() terminus.drush.inc:1754      [warning]
.curl_setopt_array(): 85 is not a valid cURL handle resource          [warning]
terminus.drush.inc:3802
curl_exec(): 85 is not a valid cURL handle resource                  [warning]
terminus.drush.inc:3808
curl_errno(): 85 is not a valid cURL handle resource                 [warning]
terminus.drush.inc:3815
curl_getinfo(): 85 is not a valid cURL handle resource               [warning]
terminus.drush.inc:3821

@mikevanwinkle
Copy link
Contributor

I think this sounds like a reasonable. Let me test and get back to you.

@mikevanwinkle
Copy link
Contributor

Not sure I understand the reasoning behind the timeout though.

@iamEAP
Copy link
Contributor Author

iamEAP commented Oct 30, 2014

I was having issues in situations where no data was being received at all (e.g. the terminus API endpoint had not yet responded within 30s). If I bumped it up, I'd hit the server-side timeout at 45 seconds, but with data coming in mid-stream.

For whatever reason, Terminus was better able to recover in situations where some data was being returned, rather than 0 bytes. Widening that timeout window to 45 seconds helped.

Though I'm unable to verify anymore because I think you guys cleared our job queue history.

@mikevanwinkle
Copy link
Contributor

So I'm a little hesitant here on the timeout. The reason is that from a ops POV there's some risk that if we suddenly start holding connections open longer I worry there could be residual unintended impacts on the API server. So I'd like to get @nstielau @joemiller or @joshkoenig thoughts here.

@iamEAP
Copy link
Contributor Author

iamEAP commented Oct 31, 2014

Sure. If it's a no-go, I'm happy to back out that timeout change.

@joshkoenig
Copy link
Contributor

Thanks for submitting, Eric. I'll take a closer look at this shortly.

@joshkoenig
Copy link
Contributor

I don't think there's anything wrong with this change right now, but I do believe we are going to be shortening the timeout in the future. Internally it's set for 10 seconds.

Generally speaking we want the parameters of the system to be such that we can always deliver a response in that amount of time. In cases like yours where there's a prodigious amount of data, we will be implementing paging.

However, for right now having the client timeout match the current backend timeout is fine. @mikevanwinkle - feel free to merge this.

👍

@iamEAP
Copy link
Contributor Author

iamEAP commented Nov 18, 2014

Any word on this @mikevanwinkle and/or @joshkoenig ? We're pretty consistently hitting these again; would be great to get this merged in if possible.

@joshkoenig
Copy link
Contributor

Yeah lets merge this right now. We'll sort out a holistic solution in the forthcoming TERMINUS 2.0 ⚡ ⚡ ⚡

mikevanwinkle added a commit that referenced this pull request Nov 19, 2014
Fix infinite loop caused by edge case during terminus_job_poll()
@mikevanwinkle mikevanwinkle merged commit dd3e8f2 into pantheon-deprecated:master Nov 19, 2014
@mikevanwinkle
Copy link
Contributor

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants