-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix multiple issues with a driver pool #250
Conversation
CI seems to be 🔴 for some reason. |
// we want to know sooner rather than later | ||
// TODO(dennwc): sometimes the initialization of the container takes >5 sec | ||
// meaning that the time between Container.Start and the actual | ||
// execution of a Go server (not the native driver) takes this long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, could you please help me understand where would this time be visible in our current instrumentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be included in the bblfshd.pool.newDriverPool
span in Jaeger.
But there is no direct measurement for this specific delay - it will account for the whole "start driver, wait for the gRPC server" step. I plan to add more instrumentation later because it requires changes to SDK and drivers.
CI is green |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
Signed-off-by: Denys Smirnov <[email protected]>
Signed-off-by: Denys Smirnov <[email protected]>
Signed-off-by: Denys Smirnov <[email protected]>
Signed-off-by: Denys Smirnov <[email protected]>
This PR fixes multiple issues found while addressing the timeout issue in C# driver: bblfsh/csharp-driver#25
The issue was caused by a combination of factors:
bblfshd
was limiting the timeout for driver startup to 5 sec. So the first request will always timeout if the driver init takes longer, and there is no way for the client to change it. It was fixed by passing the context directly.If the first request failed,
bblfshd
was setting the driver pool anyway, and it become broken since no one returns the failed driver back to the queue. This caused a deadlock for any consequent requests. Fixed by not setting the pool/queue on error.Maximal gRPC backoff delay was set too high, leading to unnecessary delay when driver takes more time to init. Fixed by capping the delay to 1 sec (it will retry).
This should fix the bblfsh/csharp-driver#25, although the container startup may still take up to 5-6 sec. This is a startup time of the container itself, even before the driver gets a chance to execute. Need to check what may be the reason for this.
Few more issue were discovered along the way:
Failed driver was not shutdown properly. May have caused other issues in
bblfshd
. Fixed by stopping the container on the error path.bblfshd
was not acquiring the pool lock then shutting down. This may have caused data races with clients sending parse requests. Fixed by acquiring the lock.Also, this PR includes few more changes:
Mutex for the driver pools was changed to RW to allow multiple callers to use an driver existing pool if it was already initialized.
bblfshd
now supports a--pprof
flag to expose a pprof HTTP endpoint. Useful to find deadlock, profile, etc.