Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use k8s node-affinity to try to get hg pds and volumes on the same node #732

Closed
myieye opened this issue Apr 17, 2024 · 9 comments
Closed
Labels
K8S or Docker owner: Robin, Kevin

Comments

@myieye
Copy link
Contributor

myieye commented Apr 17, 2024

We can see that performance took a hit after our release Yesterday:
Apparently @hahn-kev talked to TechOps and the hg pods and volumes were on:

  • the same node before the release
  • different nodes after the release

So, that's the working theory for the performance regression. hgweb presumably uses a ton of file-system reads when it updates its "repo index". So, being on a different node almost certainly makes a noticeable difference.

image

It's not NEARLY as bad as it was before. So, we're not panicing, but there's room for improvement. And after seeing how good it can be, I find it hard to be satisfied with the current situation.

There are several options here:

nodeAffinity sounds like the simplest decent idea. It's a bit dissatisfying, because we don't really care what node things land on we just want them to be on the same node. So, in that case Local Persistent Volumes may be more suitable.

@myieye myieye added the K8S or Docker owner: Robin, Kevin label Apr 17, 2024
@hahn-kev
Copy link
Collaborator

I don't know if we have access to local volumes, we would need to talk to LT ops about that as it may not be available.

That said a RWO volume would be a similar solution that will perform better (I believe it's actually similar to local volumes). It will also require node affinity so that multiple pods can access the volume, right now both lexbox and hgweb need access to the file system.

@tim-eves
Copy link

tim-eves commented Apr 18, 2024 via email

@hahn-kev
Copy link
Collaborator

I think the problem has less to do with bandwidth than with iops, so gbps doesn't really matter if the latency is high

@hahn-kev
Copy link
Collaborator

before we attempt to do this I think we need to measure the difference, mostly in IOPS, less in bandwidth.

@rmunn
Copy link
Contributor

rmunn commented May 15, 2024

I've pretty much proved that some of our issues (such as #765 and #728) are caused by NFS: the LexBox API pod changes the filesystem (creating a new project, or resetting an existing project's repo to have a different root commit), but the HgWeb pod doesn't see the change for a while (typically 30-60 seconds in my experience).

All my attempts to solve the problem so far have failed. For example, in #789 I ran sync on the LexBox API pod, hoping that this would force NFS to flush its client cache to the server and therefore let the HgWeb pod see the change sooner. But even after running sync, it takes roughly 30 seconds before the HgWeb pod has the same view of the Mercurial repo that the LexBox pod does. This has caused us much frustration as our integration tests are producing false failures when the HgWeb pod has an outdated view of the filesystem, or else the tests time out while we wait for HgWeb to see the "correct" filesystem state.

Are there any ReadWriteMany volume types we could use that aren't backed by NFS? Something that would allow us to make a change in one pod, and have the other pod reliably see the same change (even if we have to manually force a sync) would solve a lot of our issues.

@rmunn
Copy link
Contributor

rmunn commented May 15, 2024

A drawback of ReadWriteOnce is that deploying can't use the "spin up second pod before spinning down first pod", so you end up with service interruptions. The first pod has to spin down first, then the second pod can spin up, and if the spinup time is long then you can end up with a service outage of several minutes. Plus, if the spinup of the new pod fails for some reason, your service is down until you can bring the original pod back up (which is sometimes tricky if the volume has now been "assigned" to the pod that's failing).

ReadWriteMany allows a much safer deployment process... but if it's at the cost of consistent integration test failures, I'm not sure it's worth it anymore.

@tim-eves
Copy link

tim-eves commented May 15, 2024 via email

@rmunn
Copy link
Contributor

rmunn commented May 16, 2024

Note that in addition to node affinity, there's also pod affinity: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity. That says "I don't care about the node labels, but I want the hgweb pod on the same node as the lexbox pod".

@hahn-kev
Copy link
Collaborator

this was to solve a performance issue, we've handled that in other ways so we don't need to do this anymore

@hahn-kev hahn-kev closed this as not planned Won't fix, can't repro, duplicate, stale Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
K8S or Docker owner: Robin, Kevin
Projects
None yet
Development

No branches or pull requests

4 participants