-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go/pkg/balancer uses experimental gRPC APIs #499
Comments
Thanks Doug. This has been working just fine so far and I'm worried about regressions if I change the implementation. I'm considering copying the relevant code over under an "internal" directory here to remove the direct dependency on the experimental API. Would that be sufficient to remove the direct dependency on the experimental API? Do you see any licensing issues? |
Since this has been working just fine so far, I opted to copying the relevant code over to avoid regressions from a change of implementation. Fixes bazelbuild#499.
Maybe I'm misunderstanding, but I don't know how this could actually work unless you fork all of grpc. You're depending on grpc's behavior as much as the APIs themselves, so just copying the APIs won't let you keep using the functionality in grpc that is behind those APIs in the client implementation. Worst case, I can help to just update the deprecated things to the newer versions. You'd still be using a currently-experimental API, which isn't great but is no worse than the current state at least. Are you actually running into GFE max streams limits in practice? If so you might want to adopt something like I linked in the original comment, which seems to be working for the google cloud client APIs. That avoids using any experimental APIs of gRPC's. You may even be able to use the package directly, since it's not in Feel free to put something on my calendar any time you see an opening to discuss if desired. |
Since the grpc-go version is pinned here, I was considering the simple case of not wanting this module to depend on the upstream one. My reading of the issue is that this module will break (either at compile or runtime) when grpc-go is updated to a version where the experimental APIs have changed (in behaviour or interface). Otherwise, it should continue to work. I'll look into addressing this properly as you suggested. |
If you are a binary and not a library, then this will work for you in theory until you need to update grpc-go, or until you need to update a dependency that updates grpc-go. But in that case, no change is needed here at all. However, since we both import to google3, this will be making my life more challenging, as we'll need to get you updated before gRPC-Go can delete the old APIs. |
Right.. this forces SDK users to use the same version until it's updated. I also forgot about the google3 copy.. The urgency of this make more sense now. |
Since this has been working just fine so far, I opted to copying the relevant code over to avoid regressions from a change of implementation. Fixes bazelbuild#499.
@codyoss I'm wondering how bad connections are handled in the pool implementation in cloud libraries: https://github.com/googleapis/google-api-go-client/blob/5e4c19e2fded180e57163208fcf264d26166a99e/transport/grpc/pool.go#L30 IIUC, the experimental balancer package defines an API for managing the lifecycle of a connection: https://github.com/grpc/grpc-go/blob/e88e8498c6df39703d1992bc4b677e7ef477145d/balancer/balancer.go#L384 which allows regenerating a connection without affecting others. |
If a ClientConn has a lost connection, it would re-establish the connection on its own. But I'm not sure if the connection pool would skip it for dispatching RPCs - without something like that, some RPCs might error or be queued, even if there are other connections that could be used successfully instead.
The main goal is to avoid using that outside of gRPC until the API is stable, though. |
Thanks Doug. I think that answers my question about cloud libraries, but I'm still not sure how best to exclude connections in bad state from the pool. That balancer API helps keep track of good SubCons which the picker can choose from. I'm considering using a similar approach to that in cloud libraries, which is by overriding |
We do have an API to show the connectivity state of the channel as a whole. But I wonder if you can just ignore that potential problem and just re-use what the client libraries have directly? You'll always have to deal with retrying failed RPCs anyway, right? Like if you have only one channel and an RPC on it errors. |
The method I found Without reacting to connection status changes there is potential to increase retry rates. I don't know how bad might that be in the wild, but I'm considering my options to mitigate that. My best bet so far is to spread retries across the pool to minimize the chance of a request getting the same bad connection. Round-robin seems like the simplest policy that achieves that, but I don't know who is relying on the current affinity-based-and-least-busy policy here and I'd rather not find out by pulling the plug on them. |
Ah, right. I don't think we particularly like that API since it's lossy (i.e. you can miss a temporary state), which isn't important for most use cases but also isn't really ideal. I suspect that one would be difficult to ever remove or change, unlike the balancer APIs, and there are no current plans to do so.
I honestly don't think it would be any worse than using a single channel with the current LB policy you are using. RPC failures will occur for RPCs currently in flight if a backend goes down. At that point, new RPCs to that channel will not fail. The channel will go into CONNECTING, so they would be queued while the new connection is established. If no backend can be reached, then the RPCs queued on it will fail eventually, but most likely all the other channels will be down as well.
Interesting. Who uses the gRPC clients created with this LB policy? I had assumed it was your code. I thought you just copied this out of https://github.com/GoogleCloudPlatform/grpc-gcp-go for the connection pooling aspect, but for your own use, and because of the max concurrent streams issue you were running into on GFE. |
It turns out the code was actually copied from the gcp repo. I can test changes with bazelbuild/reclient, but I'm not sure about potential other clients. I'll simplify the policy to get rid of experimental API usage. I'll also try and make it pluggable such that if someone else really wants a more sophisticated policy they can plug their own (unless that turns into a deep rabbit hole). |
All custom LB policies use experimental code, as the balancer package itself is experimental. :) I still recommend starting with what I mentioned in the initial comment, and let me know if that solution isn't working for any reason, and we could go from there. I'm happy to meet and discuss details in person if it helps. |
Posted #554 as an initial step. Will post another one to remove the dependent code once the new balancer is fully rolled out. I'm not 100% sure about using the |
I don't think I would use that interface instead of a Instead, I'd define my own interface like: type grpcClientConn interface {
grpc.ClientConnInterface
Close() error
} Now you know (and can verify in code) that var _ grpcClientConn = (*grpc.ClientConn)(nil)
var _ grpcClientConn = (*balancer.RoundRobinConnPool)(nil) Also, prefer returning concrete types over interfaces whenever possible: i.e. export |
The custom balancer depends on experimental APIs from grpc-go. It has since been replaced with a simple round robin balancer that has been working well in production. Fixes bazelbuild#499
The custom balancer depends on experimental APIs from grpc-go. It has since been replaced with a simple round robin balancer that has been working well in production. Fixes bazelbuild#499
The custom balancer depends on experimental APIs from grpc-go. It has since been replaced with a simple round robin balancer that has been working well in production. Fixes bazelbuild#499
The custom balancer depends on experimental APIs from grpc-go. It has since been replaced with a simple round robin balancer that has been working well in production. Fixes #499
These APIs are labeled as experimental, and they will be changed in the near-ish future. Details are in grpc/grpc-go#6472. Note that any package that uses gRPC's experimental APIs should itself be considered experimental, as a change to them would break compilation for the entire package that uses them. So we do NOT recommend the use of these features in any other libraries or applications that aren't tolerant to breakages like this.
https://github.com/googleapis/google-cloud-go may have another solution to the same type of problem, which is that it creates multiple channels and load balances between them. See https://github.com/googleapis/google-api-go-client/blob/caea95689f82049822552cd649765335123831e0/transport/grpc/pool.go#L30 for details.
The text was updated successfully, but these errors were encountered: