-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publisher thread terminates, forever breaking publication when GCE metadata service blips #1173
Comments
Hi pgcamus, thank you for bringing this to our attention. Wanted to list some of my observations:
This is correct. But, one thing I observe here is that the publishing of subsequent messages aren't affected, because of this sequence of events:
(this causes a separate issue - more on this later)
Having mentioned the above, I do observe that there are a few related issues that need to be addressed:
This does solve the problem of:
The side effect of the above change, however, would be that, given the state of the batch would now be set to python-pubsub/google/cloud/pubsub_v1/publisher/_batch/thread.py Lines 445 to 447 in cdaf6e9
which would then be bubbled up until a BaseException is thrown and the client library crashes in the publisher client: python-pubsub/google/cloud/pubsub_v1/publisher/client.py Lines 481 to 495 in cdaf6e9
I'm exploring a few options to solve the above issues, without causing unintended side effects and also factoring in the behavior of the Pub/Sub client libraries of other languages to keep the behavior consistent across the libraries. I'll keep this thread posted. Thanks! |
Thanks for stopping by to let us know something could be better!
PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.
Please run down the following list and make sure you've tried the usual "quick fixes":
If you are still having issues, please be sure to include as much information as possible:
Environment details
pip --version
google-cloud-pubsub
version: 2.21.1Steps to reproduce
Run
google-cloud-pubsub
and suffer a metadata outage like https://status.cloud.google.com/incidents/u6rQ2nNVbhAFqGCcTm58.Note that this can trigger even in an un-sustained GCE metadata outage as once this exception triggers even once, the commit thread is dead forever. In our case, there was a short outage on the metadata server, but the retries all happened so quickly that the exception was raised before the service recovered
Code example
# example
Stack trace
Speculative analysis
It looks like the issue is that the
google-auth
library is raising aTransportError
which is not caught by the batch commit thread in this library. Potential fixes include catching that inBatch._commit
(e.g. here), or catching it further down ingoogle-cloud-pubsub
and wrapping it in aGoogleAPIError
.The text was updated successfully, but these errors were encountered: