Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs-only] ADR0029 - grpc in kubernetes #9488

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

[docs-only] ADR0029 - grpc in kubernetes #9488

wants to merge 1 commit into from

Conversation

butonic
Copy link
Member

@butonic butonic commented Jun 27, 2024

I investigated #8589 and tried to sum up my findings in an ADR because it may have architectural consequences.

@butonic butonic self-assigned this Jun 27, 2024
Copy link

update-docs bot commented Jun 27, 2024

Thanks for opening this pull request! The maintainers of this repository would appreciate it if you would create a changelog item based on your changes.

@butonic butonic changed the title ADR0029 - grpc in kubernetes [docs-only] ADR0029 - grpc in kubernetes Jun 27, 2024
@butonic butonic force-pushed the adr00029 branch 2 times, most recently from 0e9df5b to 4e66dec Compare June 27, 2024 13:02
@butonic
Copy link
Member Author

butonic commented Jun 28, 2024

When trying to use the dns:/// resolver of grpc-go in cs3org/reva#4744 and thinking about the consequences I found a blog post that implemented a k8s-resolver to achieve Perfect Round Robin, Inclusion of Newly Created Pods and Smooth Redeployment. But I wondered if there were other resolver implementations for kubernetes, which led me to https://github.com/sercand/kuberesolver. It watches the kubernetes api and has been around since 2018 and the last release is from 2023 ... which seems solid.

I'll add it to cs3org/reva#4744 and make the service names configurable in #9490 ... then we can test the behavior under load.

@butonic
Copy link
Member Author

butonic commented Jul 16, 2024

After digesting this, ponder on the thought that some services expose http ports as well as grpc ... we need to clerify how http requests are retried and load balanced as well. if grpc uses headless services and dns .... that does might not mix with go micro http clients ...

Copy link
Collaborator

@kobergj kobergj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@butonic after you investigated it, which option would you personally prefer?

docs/ocis/adr/0029-grpc-in-kubernetes.md Outdated Show resolved Hide resolved
docs/ocis/adr/0029-grpc-in-kubernetes.md Outdated Show resolved Hide resolved
docs/ocis/adr/0029-grpc-in-kubernetes.md Outdated Show resolved Hide resolved
docs/ocis/adr/0029-grpc-in-kubernetes.md Outdated Show resolved Hide resolved
@butonic butonic force-pushed the adr00029 branch 3 times, most recently from b755c50 to 3667594 Compare July 17, 2024 09:56
@butonic
Copy link
Member Author

butonic commented Jul 17, 2024

I no longer see a strict requirement to have a service registry for the two main deployment scenarios. For a bare metal deployment I'd prefer unix sockets for grpc and for kubernetes I'd prefer DNS because the go grpc libs support balancing based on DNS. Even for docker (compose) unix sockets can be replaced with tcp connections to hostnames for setups that need to run some services in a dedicated container.

Now http requests also need to be load balanced and retried ... in kubernetes long running http connections would face the same problems as grpc: the client might try to send requests to a no longer or not yet ready / healty service. But I havent found a good resource on how to retry and load balance http connections in kubernetes based on the same dns magic that go-grpc does. Something like esiqveland/balancer, benschw/dns-clb-go, benschw/srv-lb ... but maintained? https://github.com/markdingo/cslb had a release 2023

Signed-off-by: Jörn Friedrich Dreyer <[email protected]>
Copy link

sonarcloud bot commented Jul 17, 2024

@jvillafanez
Copy link
Member

  • bad, because we would lose the service registry - which migh also be good, see above
  • bad, because every client will hammer the dns, maybe causing a similar load problem as with the go micro kubernetes registry implementation - needs performance testing

As far as I see, the DNS would act as our service registry. We'd still have the service registry, just not maintained by us (which could be good).

While I assume this would work for kubernetes (I see some plans on how it could work), we'll also need to take into account other environments.
For docker, we might need a custom dns server added in the compose file so our services can register there and use it (using an external one might not be a good idea, specially considering entry pollution). Not sure how it could work on bare metal installations.

Moreover, this should have a fully automated setup and tear down (or provide a simple command to do it). Configuring DNS entries the way we want manually won't be for average people.

For the "client hammering the DNS", I think that would be a client behavior we could fix. I mean, once the client have resolved the DNS and we're connected to the target service, it's up to the client to decide to reuse the same connection or request a new connection to a different replica.
I guess the key point is when we want to request a new connection to the DNS server. We could hit the DNS once per request, or maybe once per minute. In any case, different services are expected to land in different replicas, so even in the worst case scenario, the workload should be shared among the replicas, although maybe not evenly.

One big problem I see with this solution is that we'll need to do a migration. This seems a big breaking change, and maybe a drawback big enough to discard the solution.

@butonic
Copy link
Member Author

butonic commented Jul 17, 2024

I agree that in a kubernetes environment using headless services and dns would act as the service registry for go-grpc clients. (I still need to better understand http clients.)

I see four ways tu run ocis:

  1. local dev deployment
  2. bare metal deployment with maybe systemd
  3. docker (compose) deployment
  4. kubernetes deployment

For the first three deployment types unix sockets would suffice.

In docker we can use hostnames with a tcp transport if we really need to spread the services in multiple containers. I don't see the necessity for a dedicated dns server. docker swarm also has a service concept with a virtual ip. using dns together with timeouts in the keep alive of grpc clients would work less ideal than kubernetes, but it would still work.

For kubernetes we can use dns and preconfigure all addresses using the helm charts.

IMNSHO we should aim for unix sockets and fewer processes / pods. We should move some tasks to dedicated containers for security, eg. thumbnailers and content indexing. The current helm chart deploying every service in a dedicated container is just a waste of resources - AND fragile.

For grpc the go client package has evolved to a point where it can handle everything that is necessary: https://pkg.go.dev/google.golang.org/grpc#ClientConn

A ClientConn encapsulates a range of functionality including name resolution, TCP connection establishment (with retries and backoff) and TLS handshakes. It also handles errors on established connections by re-resolving the name and reconnecting.

Retries are a matter of configuration.

But picking up new dns entries is ... a long standing issue grpc/grpc#12295 with the two scenarios (existing pod goes down, new pod comes up) starting to be discussed in grpc/grpc#12295 (comment). Reading the thread it seems the default dns:// resolver will, by design, not pick up new pods unless we configure a MaxConnectionAge on the server-side. The 'optimal' solution is to use a name resolution system that has notifications - aka the kubernetes API.

cs3org/reva#4744 allows us to test and benchmark both: grpc go with dns:// (using headless services) and kubernetes:// (using the kubernetes api) addresses without breaking backwards compatibility (using the go micro service registry).

@jvillafanez
Copy link
Member

jvillafanez commented Jul 17, 2024

As far as I understand, the DNS would also act as a load balancer by choosing a random (or not so random) replica. I mean, "serviceA" might get the ip "10.10.10.1" for "serviceZ", but "serviceB" might get ip "10.10.10.2" also for "serviceZ". This will be done client-side: the DNS will return one or more ips so the client will need to choose which one it wants to use.

It seems docker has an internal DNS server we could use. Assuming it has all the capabilities we need, we wouldn't need a custom DNS server. (I don't know how we can configure the DNS to provide the SRV records we need - or how are we going to register our services in the DNS otherwise; so we might still need a custom DNS we can configure at will)

IMNSHO we should aim for unix sockets and fewer processes / pods. We should move some tasks to dedicated containers for security, eg. thumbnailers and content indexing. The current helm chart deploying every service in a dedicated container is just a waste of resources - AND fragile.

If we're going through the dns route, I think it should work everywhere regardless of the deployment. This includes kubernetes with every service in an independent server, despite the deployment itself could be a bad idea. Then we could have "official" deployments with different sets of services in different servers.
In this regard, using unix sockets should be an optional optimization, which could be easier to setup in one of our "official" setups, but I don't think it should be the main focus because it would limit the solution too much.


For docker, it seems that we aim for something like (only relevant content):

services:
  wopiserver_oo:
    deploy:
      mode: replicated
      replicas: 3
      endpoint_mode: dnsrr

dig response for a different container in the same docker network:

root@942926fa8300:/# dig wopiserver_oo

; <<>> DiG 9.16.48-Ubuntu <<>> wopiserver_oo
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48674
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;wopiserver_oo.			IN	A

;; ANSWER SECTION:
wopiserver_oo.		600	IN	A	172.19.0.10
wopiserver_oo.		600	IN	A	172.19.0.11
wopiserver_oo.		600	IN	A	172.19.0.9

;; Query time: 11 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Thu Jul 18 08:49:21 CEST 2024
;; MSG SIZE  rcvd: 118

I guess that should match the kubernetes setup and whatever library we use for the connection is able to work with it.

Comment on lines +21 to +26
To leverage the kubernetes pod state, we first used the go micro kubernetes registry implementation. When a pod fails the health or readyness probes, kubernetes will no longer
- send traffic to the pod via the kube-proxy, which handles the ClusterIP for a service,
- list the pod in DNS responses when the ClusterIP is disabled by setting it to `none`
When using the ClusterIP HTTP/1.1 requests will be routed to a working pod.

This nice setup starts to fail with long lived connections. The kube-proxy is connection based, causing requests with Keep-Alive to stick to the same pod for more than one request. Worse, HTTP/2 and in turn gRPC are multiplexing the connection. They will not pick up any changes to pods, explaining the symptomps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we using this at all? Isn't the Go micro registry returning pod IPs?
And from what I know, the nats-js-kv service registry doesn't have any insights into healthiness or readiness of services it tries to contact.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. new pods will not be used because clients will reuse the existing gRPC connection
2. gRPC clients will still try to send traffic to killed pods because they have not picked up that the pod was killed. Or the pod was killed a millisecond after the lookup was made.

An addition to this problem are the health and readyness implementations of oCIS services not always reflecting the correct state of the service. One example is the storage-users service that returns ready `true` while runing a migration on startup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I know all /healthz and /readyz endpoints are hardcoded to true. Which is funny because the debug server might be up before the actual service server.

@butonic butonic mentioned this pull request Oct 16, 2024
@butonic butonic removed their assignment Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: blocked
Development

Successfully merging this pull request may close these issues.

4 participants