-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use k8s node-affinity to try to get hg pds and volumes on the same node #732
Comments
I don't know if we have access to local volumes, we would need to talk to LT ops about that as it may not be available. That said a RWO volume would be a similar solution that will perform better (I believe it's actually similar to local volumes). It will also require node affinity so that multiple pods can access the volume, right now both lexbox and hgweb need access to the file system. |
Hi Tim, Kevin,
I'd be surprised if non locality of the storage should make that much of a
difference. RWX is just NFS, and RWO is iSCSI, so not always local after a
migration either either just exclusive. It's plausible hg doesn't perform
well with non local NFS I guess, but I'd expect that's to be a known issue
by now, so it should show up in but trackers etc. Kevin is right about the
need for node affinity to keep the container on the same host as the PV.
But the networking between nodes is 10Gbps, so I don't see how non-locality
could slow everything down. For reference SATA is 6Gbps, and the connection
to internet client is much slower, capped at 1Gbps for AWS, or 200Mbps for
Dallas.
If it's currently happening, or next time it does, could you run fio from
an hgweb container against the PV mount point and also against a dir not on
any PV mount point (non-PV backed filesystems always use node local
storage). That would rule in or out locality of storage.
God bless,
Tim
…On Thu, 18 Apr 2024, 00:56 Kevin Hahn, ***@***.***> wrote:
I don't know if we have access to local volumes, we would need to talk to
LT ops about that as it may not be available.
That said a RWO volume would be a similar solution that will perform
better (I believe it's actually similar to local volumes). It will also
require node affinity so that multiple pods can access the volume, right
now both lexbox and hgweb need access to the file system.
—
Reply to this email directly, view it on GitHub
<#732 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABIW5IED77NIZMTELAAIPNDY52ZTDAVCNFSM6AAAAABGK6JKH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRRHA4TAOJSGU>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I think the problem has less to do with bandwidth than with iops, so gbps doesn't really matter if the latency is high |
before we attempt to do this I think we need to measure the difference, mostly in IOPS, less in bandwidth. |
I've pretty much proved that some of our issues (such as #765 and #728) are caused by NFS: the LexBox API pod changes the filesystem (creating a new project, or resetting an existing project's repo to have a different root commit), but the HgWeb pod doesn't see the change for a while (typically 30-60 seconds in my experience). All my attempts to solve the problem so far have failed. For example, in #789 I ran Are there any ReadWriteMany volume types we could use that aren't backed by NFS? Something that would allow us to make a change in one pod, and have the other pod reliably see the same change (even if we have to manually force a sync) would solve a lot of our issues. |
A drawback of ReadWriteOnce is that deploying can't use the "spin up second pod before spinning down first pod", so you end up with service interruptions. The first pod has to spin down first, then the second pod can spin up, and if the spinup time is long then you can end up with a service outage of several minutes. Plus, if the spinup of the new pod fails for some reason, your service is down until you can bring the original pod back up (which is sometimes tricky if the volume has now been "assigned" to the pod that's failing). ReadWriteMany allows a much safer deployment process... but if it's at the cost of consistent integration test failures, I'm not sure it's worth it anymore. |
You can use rolling updates with RWO, but you need to set the node affinity
to ensure there new pod stats on the same node as the old. RWO volumes
can be mounted only a single node at a time, but any number of pods on that
node.
…On Wed, 15 May 2024, 10:59 Robin Munn, ***@***.***> wrote:
A drawback of ReadWriteOnce is that deploying can't use the "spin up
second pod before spinning down first pod", so you end up with service
interruptions. The first pod has to spin down *first*, then the second
pod can spin up, and if the spinup time is long then you can end up with a
service outage of several minutes. Plus, if the spinup of the new pod fails
for some reason, your service is down until you can bring the original pod
back up (which is sometimes tricky if the volume has now been "assigned" to
the pod that's failing).
ReadWriteMany allows a much safer deployment process... but if it's at the
cost of consistent integration test failures, I'm not sure it's worth it
anymore.
—
Reply to this email directly, view it on GitHub
<#732 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABIW5ICMLG76AKQ5KJSDXLDZCMWXXAVCNFSM6AAAAABGK6JKH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJSGA4DGMBWGU>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Note that in addition to node affinity, there's also pod affinity: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity. That says "I don't care about the node labels, but I want the hgweb pod on the same node as the lexbox pod". |
this was to solve a performance issue, we've handled that in other ways so we don't need to do this anymore |
We can see that performance took a hit after our release Yesterday:
Apparently @hahn-kev talked to TechOps and the hg pods and volumes were on:
So, that's the working theory for the performance regression. hgweb presumably uses a ton of file-system reads when it updates its "repo index". So, being on a different node almost certainly makes a noticeable difference.
It's not NEARLY as bad as it was before. So, we're not panicing, but there's room for improvement. And after seeing how good it can be, I find it hard to be satisfied with the current situation.
There are several options here:
nodeAffinity
: prefer certain node(s) based on labelsnodeSelector
: force certain node(s) based on labelsnodeName
: force node selection based on namenodeAffinity
sounds like the simplest decent idea. It's a bit dissatisfying, because we don't really care what node things land on we just want them to be on the same node. So, in that case Local Persistent Volumes may be more suitable.The text was updated successfully, but these errors were encountered: