You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Before my gRPC client code was set up to automatically retry, it would occasionally cause exceptions due to flaky network connections. I would expect that these exceptions, which are logged to stderr, would be submitted to the server, and they were - but due to a comedy of errors, the server aborts the request before saving the output and the easy way to debug this is lost.
Here's the code that's causing the problem (in bot_handlers.py):
if (isolated_stats or cipd_stats) and bot_overhead is None:
...
self.abort_with_error(400, ...)
bot_overhead appears to be None because in task_runner.py, the duration returned by run_isolated.py was None (though I'm not 100% certain on this):
62731 2016-12-16 18:43:34.682 D: run_isolated:
{u'stats': {u'isolated': {u'upload': {u'duration': 0.07223010063171387, u'items_hot': u'eJzTBgAALAAs', u'items_cold': u''}}}, u'internal_failure': u'<_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, )>', u'outputs_ref': {u'isolatedserver': u'130.211.180.183:90', u'namespace': u'grpc-proxy', u'isolated': u'51b8142c823ecc5b2a2178d35e620cd2c1962358'}, u'exit_code': None, u'version': 5, u'duration': None, u'had_hard_timeout': False}
62731 2016-12-16 18:43:34.682 E: <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, )>
This is despite the fact that the task actually ran and its results (or most of them) were actually uploaded. Somehow, therefore, the final value of "duration" is not being populated in run_isolated, which is resulting in bot_overhead not being populated in task_runner, which results in a bad request being sent to the server, which results in the HTTP 400 error. I think.
I had a look at trying to ensure that duration was never None, and that if duration was not None, that bot_overhead was never None as well. However, the control flow is pretty complicated and it would take a while for me to get through all the permutations. So I'm just filing this bug instead since I don't think I'll be running into this very often anymore now that my gRPC client is more stable.
The text was updated successfully, but these errors were encountered:
Before my gRPC client code was set up to automatically retry, it would occasionally cause exceptions due to flaky network connections. I would expect that these exceptions, which are logged to stderr, would be submitted to the server, and they were - but due to a comedy of errors, the server aborts the request before saving the output and the easy way to debug this is lost.
Here's the code that's causing the problem (in bot_handlers.py):
bot_overhead appears to be None because in task_runner.py, the duration returned by run_isolated.py was None (though I'm not 100% certain on this):
62731 2016-12-16 18:43:34.682 D: run_isolated:
{u'stats': {u'isolated': {u'upload': {u'duration': 0.07223010063171387, u'items_hot': u'eJzTBgAALAAs', u'items_cold': u''}}}, u'internal_failure': u'<_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, )>', u'outputs_ref': {u'isolatedserver': u'130.211.180.183:90', u'namespace': u'grpc-proxy', u'isolated': u'51b8142c823ecc5b2a2178d35e620cd2c1962358'}, u'exit_code': None, u'version': 5, u'duration': None, u'had_hard_timeout': False}
62731 2016-12-16 18:43:34.682 E: <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, )>
This is despite the fact that the task actually ran and its results (or most of them) were actually uploaded. Somehow, therefore, the final value of "duration" is not being populated in run_isolated, which is resulting in bot_overhead not being populated in task_runner, which results in a bad request being sent to the server, which results in the HTTP 400 error. I think.
I had a look at trying to ensure that duration was never None, and that if duration was not None, that bot_overhead was never None as well. However, the control flow is pretty complicated and it would take a while for me to get through all the permutations. So I'm just filing this bug instead since I don't think I'll be running into this very often anymore now that my gRPC client is more stable.
The text was updated successfully, but these errors were encountered: