Use k8s node-affinity to try to get hg pds and volumes on the same node #732

myieye · 2024-04-17T09:38:43Z

We can see that performance took a hit after our release Yesterday:
Apparently @hahn-kev talked to TechOps and the hg pods and volumes were on:

the same node before the release
different nodes after the release

So, that's the working theory for the performance regression. hgweb presumably uses a ton of file-system reads when it updates its "repo index". So, being on a different node almost certainly makes a noticeable difference.

It's not NEARLY as bad as it was before. So, we're not panicing, but there's room for improvement. And after seeing how good it can be, I find it hard to be satisfied with the current situation.

There are several options here:

nodeAffinity: prefer certain node(s) based on labels
nodeSelector: force certain node(s) based on labels
(not recommended) nodeName: force node selection based on name
Local Persistent Volumes: Perhaps the most performant, but with some complex (sounding for me) caveats

nodeAffinity sounds like the simplest decent idea. It's a bit dissatisfying, because we don't really care what node things land on we just want them to be on the same node. So, in that case Local Persistent Volumes may be more suitable.

The text was updated successfully, but these errors were encountered:

hahn-kev · 2024-04-17T17:55:39Z

I don't know if we have access to local volumes, we would need to talk to LT ops about that as it may not be available.

That said a RWO volume would be a similar solution that will perform better (I believe it's actually similar to local volumes). It will also require node affinity so that multiple pods can access the volume, right now both lexbox and hgweb need access to the file system.

tim-eves · 2024-04-18T00:32:37Z

Hi Tim, Kevin, I'd be surprised if non locality of the storage should make that much of a difference. RWX is just NFS, and RWO is iSCSI, so not always local after a migration either either just exclusive. It's plausible hg doesn't perform well with non local NFS I guess, but I'd expect that's to be a known issue by now, so it should show up in but trackers etc. Kevin is right about the need for node affinity to keep the container on the same host as the PV. But the networking between nodes is 10Gbps, so I don't see how non-locality could slow everything down. For reference SATA is 6Gbps, and the connection to internet client is much slower, capped at 1Gbps for AWS, or 200Mbps for Dallas. If it's currently happening, or next time it does, could you run fio from an hgweb container against the PV mount point and also against a dir not on any PV mount point (non-PV backed filesystems always use node local storage). That would rule in or out locality of storage. God bless, Tim

…

On Thu, 18 Apr 2024, 00:56 Kevin Hahn, ***@***.***> wrote: I don't know if we have access to local volumes, we would need to talk to LT ops about that as it may not be available. That said a RWO volume would be a similar solution that will perform better (I believe it's actually similar to local volumes). It will also require node affinity so that multiple pods can access the volume, right now both lexbox and hgweb need access to the file system. — Reply to this email directly, view it on GitHub <#732 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABIW5IED77NIZMTELAAIPNDY52ZTDAVCNFSM6AAAAABGK6JKH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRRHA4TAOJSGU> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

hahn-kev · 2024-04-24T19:20:07Z

I think the problem has less to do with bandwidth than with iops, so gbps doesn't really matter if the latency is high

hahn-kev · 2024-04-29T19:19:10Z

before we attempt to do this I think we need to measure the difference, mostly in IOPS, less in bandwidth.

rmunn · 2024-05-15T09:55:59Z

I've pretty much proved that some of our issues (such as #765 and #728) are caused by NFS: the LexBox API pod changes the filesystem (creating a new project, or resetting an existing project's repo to have a different root commit), but the HgWeb pod doesn't see the change for a while (typically 30-60 seconds in my experience).

All my attempts to solve the problem so far have failed. For example, in #789 I ran sync on the LexBox API pod, hoping that this would force NFS to flush its client cache to the server and therefore let the HgWeb pod see the change sooner. But even after running sync, it takes roughly 30 seconds before the HgWeb pod has the same view of the Mercurial repo that the LexBox pod does. This has caused us much frustration as our integration tests are producing false failures when the HgWeb pod has an outdated view of the filesystem, or else the tests time out while we wait for HgWeb to see the "correct" filesystem state.

Are there any ReadWriteMany volume types we could use that aren't backed by NFS? Something that would allow us to make a change in one pod, and have the other pod reliably see the same change (even if we have to manually force a sync) would solve a lot of our issues.

rmunn · 2024-05-15T09:59:01Z

A drawback of ReadWriteOnce is that deploying can't use the "spin up second pod before spinning down first pod", so you end up with service interruptions. The first pod has to spin down first, then the second pod can spin up, and if the spinup time is long then you can end up with a service outage of several minutes. Plus, if the spinup of the new pod fails for some reason, your service is down until you can bring the original pod back up (which is sometimes tricky if the volume has now been "assigned" to the pod that's failing).

ReadWriteMany allows a much safer deployment process... but if it's at the cost of consistent integration test failures, I'm not sure it's worth it anymore.

tim-eves · 2024-05-15T11:00:40Z

You can use rolling updates with RWO, but you need to set the node affinity to ensure there new pod stats on the same node as the old. RWO volumes can be mounted only a single node at a time, but any number of pods on that node.

…

On Wed, 15 May 2024, 10:59 Robin Munn, ***@***.***> wrote: A drawback of ReadWriteOnce is that deploying can't use the "spin up second pod before spinning down first pod", so you end up with service interruptions. The first pod has to spin down *first*, then the second pod can spin up, and if the spinup time is long then you can end up with a service outage of several minutes. Plus, if the spinup of the new pod fails for some reason, your service is down until you can bring the original pod back up (which is sometimes tricky if the volume has now been "assigned" to the pod that's failing). ReadWriteMany allows a much safer deployment process... but if it's at the cost of consistent integration test failures, I'm not sure it's worth it anymore. — Reply to this email directly, view it on GitHub <#732 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABIW5ICMLG76AKQ5KJSDXLDZCMWXXAVCNFSM6AAAAABGK6JKH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJSGA4DGMBWGU> . You are receiving this because you commented.Message ID: ***@***.***>

rmunn · 2024-05-16T07:02:00Z

Note that in addition to node affinity, there's also pod affinity: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity. That says "I don't care about the node labels, but I want the hgweb pod on the same node as the lexbox pod".

hahn-kev · 2024-10-29T06:48:47Z

this was to solve a performance issue, we've handled that in other ways so we don't need to do this anymore

myieye added the K8S or Docker owner: Robin, Kevin label Apr 17, 2024

hahn-kev closed this as not planned Won't fix, can't repro, duplicate, stale Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use k8s node-affinity to try to get hg pds and volumes on the same node #732

Use k8s node-affinity to try to get hg pds and volumes on the same node #732

myieye commented Apr 17, 2024

hahn-kev commented Apr 17, 2024

tim-eves commented Apr 18, 2024 via email

hahn-kev commented Apr 24, 2024

hahn-kev commented Apr 29, 2024

rmunn commented May 15, 2024

rmunn commented May 15, 2024

tim-eves commented May 15, 2024 via email •

edited by rmunn

Loading

rmunn commented May 16, 2024

hahn-kev commented Oct 29, 2024

Use k8s node-affinity to try to get hg pds and volumes on the same node #732

Use k8s node-affinity to try to get hg pds and volumes on the same node #732

Comments

myieye commented Apr 17, 2024

hahn-kev commented Apr 17, 2024

tim-eves commented Apr 18, 2024 via email

hahn-kev commented Apr 24, 2024

hahn-kev commented Apr 29, 2024

rmunn commented May 15, 2024

rmunn commented May 15, 2024

tim-eves commented May 15, 2024 via email • edited by rmunn Loading

rmunn commented May 16, 2024

hahn-kev commented Oct 29, 2024

tim-eves commented May 15, 2024 via email •

edited by rmunn

Loading