Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multiple issues with a driver pool #250

Merged
merged 4 commits into from
Feb 13, 2019
Merged

Conversation

dennwc
Copy link
Member

@dennwc dennwc commented Feb 12, 2019

This PR fixes multiple issues found while addressing the timeout issue in C# driver: bblfsh/csharp-driver#25

The issue was caused by a combination of factors:

  1. bblfshd was limiting the timeout for driver startup to 5 sec. So the first request will always timeout if the driver init takes longer, and there is no way for the client to change it. It was fixed by passing the context directly.

  2. If the first request failed, bblfshd was setting the driver pool anyway, and it become broken since no one returns the failed driver back to the queue. This caused a deadlock for any consequent requests. Fixed by not setting the pool/queue on error.

  3. Maximal gRPC backoff delay was set too high, leading to unnecessary delay when driver takes more time to init. Fixed by capping the delay to 1 sec (it will retry).

This should fix the bblfsh/csharp-driver#25, although the container startup may still take up to 5-6 sec. This is a startup time of the container itself, even before the driver gets a chance to execute. Need to check what may be the reason for this.

Few more issue were discovered along the way:

  1. Failed driver was not shutdown properly. May have caused other issues in bblfshd. Fixed by stopping the container on the error path.

  2. bblfshd was not acquiring the pool lock then shutting down. This may have caused data races with clients sending parse requests. Fixed by acquiring the lock.

Also, this PR includes few more changes:

  1. Mutex for the driver pools was changed to RW to allow multiple callers to use an driver existing pool if it was already initialized.

  2. bblfshd now supports a --pprof flag to expose a pprof HTTP endpoint. Useful to find deadlock, profile, etc.

cmd/bblfshd/main.go Outdated Show resolved Hide resolved
@bzz
Copy link
Contributor

bzz commented Feb 12, 2019

CI seems to be 🔴 for some reason.

// we want to know sooner rather than later
// TODO(dennwc): sometimes the initialization of the container takes >5 sec
// meaning that the time between Container.Start and the actual
// execution of a Go server (not the native driver) takes this long
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, could you please help me understand where would this time be visible in our current instrumentation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be included in the bblfshd.pool.newDriverPool span in Jaeger.

But there is no direct measurement for this specific delay - it will account for the whole "start driver, wait for the gRPC server" step. I plan to add more instrumentation later because it requires changes to SDK and drivers.

@dennwc
Copy link
Member Author

dennwc commented Feb 12, 2019

CI is green

Copy link
Contributor

@bzz bzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

cmd/bblfshd/main.go Outdated Show resolved Hide resolved
cmd/bblfshd/main.go Outdated Show resolved Hide resolved
daemon/daemon.go Show resolved Hide resolved
@dennwc dennwc merged commit ecc3502 into bblfsh:master Feb 13, 2019
@dennwc dennwc deleted the fix_timeout branch February 13, 2019 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants