-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster-up, kind, common: Enable TopologyManager for kind-sriov #1347
Conversation
/retest-required |
/retest-required |
Changed topology manager policy to |
/retest-required |
/test check-up-kind-sriov |
@brianmcarey @ormergi It's probably failing barbecue of #1348 |
I dont see why #1348 cuasing this PR is failing. If anything, I hope we can get #1348 merged as soon as possible in favor of returning the lane to be required. |
/test check-up-kind-sriov |
/test check-up-kind-sriov EDIT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Nir thanks for fixing this!
It seems the mentioned PR in the commit message (kubevirt/kubevirt#13685) that test this PR against kubevirt/kubevirt seem outdated.
Could you please verify it again?
Regarding the commit message, could you please mention why enabling TopiogyManager makes the mentioned test meaningful?
I think you can drop mentioning it was tested against kubevirt/kubeivrt, you can comment about it in the PR.
Please consider wrapping the commit message body lines length, they are very long.
Other than that LGTM
SRIOV tests check topology alignment. Currently in kind/sriov kueblet does not attempt to align resources. In SRIOV alignment test, we call the function hardware.LookupDeviceVCPUAffinity(). This function returns a slice of aligned CPUs, i.e. (complex sentence ahead warning) CPU numbers, that are assigned to the guest, which share a NUMA with SRIOV VFs, that are also passed to the guest. In other words: If an app is running in the guest, and using a network interface, that NIC is physically close to its CPU, and it doesn't have to cross the system bus to get to it. Back to LookupDeviceVCPUAffinity(): if it finds that there is no aligment, it doesn't err, it just returns an empty slice. As the test is currently written, an empty list is fine i.e. no alignment: it simply validates that the guest knows that there's no alignment, by validating it in the cloud-init metadata file in the guest. This change adding a topology manager policy of single-numa-node, forces alignment. If alignment is not achieved, scheduling will fail. We will also assert in the test that the alignment slice is not empty. Add topology manager[1] to kubelet config and set its policy to single-numa-node. Together with cpu-manager policy=static, which we already set, kubelet will reject a pod that it is unable to align. [1] https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/ Signed-off-by: Nir Dothan <ndothan@redhat.com>
This change addresses @ormergi 's comment In addition, changed the TopologyManager policy from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
thanks @nirdothan
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: brianmcarey The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[1e31064 cluster-up, kind, common: Enable TopologyManager for kind-sriov](kubevirt/kubevirtci#1347) [00a9d12 Kubelet drop in](kubevirt/kubevirtci#1299) [6b7d14b build(deps): bump golang.org/x/net in /cluster-provision/gocli](kubevirt/kubevirtci#1364) ```release-note NONE ``` Signed-off-by: kubevirt-bot <kubevirtbot@redhat.com>
[997cbff k8s-provider: Remove KUBEVIRT_CPU_MANAGER_POLICY](kubevirt/kubevirtci#1289) [c2e547b fix: gocli startup in s390x architecture](kubevirt/kubevirtci#1354) [1e31064 cluster-up, kind, common: Enable TopologyManager for kind-sriov](kubevirt/kubevirtci#1347) [00a9d12 Kubelet drop in](kubevirt/kubevirtci#1299) [6b7d14b build(deps): bump golang.org/x/net in /cluster-provision/gocli](kubevirt/kubevirtci#1364) ```release-note NONE ``` Signed-off-by: kubevirt-bot <kubevirtbot@redhat.com>
[997cbff k8s-provider: Remove KUBEVIRT_CPU_MANAGER_POLICY](kubevirt/kubevirtci#1289) [c2e547b fix: gocli startup in s390x architecture](kubevirt/kubevirtci#1354) [1e31064 cluster-up, kind, common: Enable TopologyManager for kind-sriov](kubevirt/kubevirtci#1347) [00a9d12 Kubelet drop in](kubevirt/kubevirtci#1299) [6b7d14b build(deps): bump golang.org/x/net in /cluster-provision/gocli](kubevirt/kubevirtci#1364) ```release-note NONE ``` Signed-off-by: kubevirt-bot <kubevirtbot@redhat.com>
[997cbff k8s-provider: Remove KUBEVIRT_CPU_MANAGER_POLICY](kubevirt/kubevirtci#1289) [c2e547b fix: gocli startup in s390x architecture](kubevirt/kubevirtci#1354) [1e31064 cluster-up, kind, common: Enable TopologyManager for kind-sriov](kubevirt/kubevirtci#1347) [00a9d12 Kubelet drop in](kubevirt/kubevirtci#1299) [6b7d14b build(deps): bump golang.org/x/net in /cluster-provision/gocli](kubevirt/kubevirtci#1364) ```release-note NONE ``` Signed-off-by: kubevirt-bot <kubevirtbot@redhat.com>
[997cbff k8s-provider: Remove KUBEVIRT_CPU_MANAGER_POLICY](kubevirt/kubevirtci#1289) [c2e547b fix: gocli startup in s390x architecture](kubevirt/kubevirtci#1354) [1e31064 cluster-up, kind, common: Enable TopologyManager for kind-sriov](kubevirt/kubevirtci#1347) [00a9d12 Kubelet drop in](kubevirt/kubevirtci#1299) [6b7d14b build(deps): bump golang.org/x/net in /cluster-provision/gocli](kubevirt/kubevirtci#1364) ```release-note NONE ``` Signed-off-by: kubevirt-bot <kubevirtbot@redhat.com>
What this PR does / why we need it:
SR-IOV tests check topology alignment. Currently in kind/sriov kueblet does not attempt to align resources.
In SR-IOV alignment test, we call the function
hardware.LookupDeviceVCPUAffinity()
. This function returns a slice ofaligned CPUs, i.e. (complex sentence ahead warning) CPU numbers,
that are assigned to the guest, which share a NUMA with SRIOV VFs, that are also passed to the guest.
In other words: If an app is running in the guest, and using a network interface, that NIC is physically close to its CPU, and it doesn't have to cross the system bus to get to it.
Back to
LookupDeviceVCPUAffinity()
: if it finds that there is no aligment, it doesn't err, it just returns an empty slice.As the test is currently written, an empty list is fine i.e. no alignment: it simply validates that the guest knows that there's no alignment, by validating it in the cloud-init metadata file in the guest.
This change adding a topology manager policy of single-numa-node, forces alignment. If alignment is not achieved, scheduling will fail.
We will also assert in the test that the alignment slice is not empty.
Add topology manager[1] to kubelet config and set its policy to single-numa-node.
Together with cpu-manager policy=static, which we already set,
kubelet will reject a pod that it is unable to align.
[1] https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Checklist
This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.
Release note: