-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't detect/add Mellanox ConnectX-6 VFs via the plugin on my Openshift(on Openstack installation) #572
Comments
The PCI address in the config is not right. your config: "pciAddresses": ["0000:00:06.0"] |
Hi - I corrected that error in the ConfigMap - but it was still the same end result of 0 devices being added See updated output log below
|
one more step for virtual env can you remove
from the configmap please leave only the pciAddress |
I tried that but with same end result unfortunately. Logs below for that attempt
|
Noticed below line so brought the interface down before restarting device plugin pod
That seems to allow it to discover them OK.
|
Any idea why it didn't like the more specific filters? We were able to use these with our Intel based cards. |
Added back in the vendors and devices attributes and that worked also - so it seemed it didn't like the netdevice driver We use vfio-pci for our Intel cards and Openshift documentation had pointed us at setting netdevice for Mellanox cards - just for background on why we had used that
|
That is because in this case where the device plugin runs on a VM where only the VFs exist (and not the all PF) it's not a netdevice. please check the shiftonstack documentation. the openshift documentation is for baremetal where the VFs for mellanox devices should be netdevice |
let me know if I can close this issue :) |
Yes please go ahead and close. Thanks for the help |
What happened?
I have configured the plugin look for my Mellanox ConnectX-6 VFs on my nodes - they are there and appear to be detected on the node but they are never added to the Resource Pools for some reason
What did you expect to happen?
VFs pulled into the respective pools so they can be used in my pods
What are the minimal steps needed to reproduce the bug?
Mellanox ConnectX-6 VFs made available on one or more of your Openshift nodes and configured plugin to try and find them
Anything else we need to know?
lspci output from node
05:00.0 Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e]
Subsystem: Mellanox Technologies Device [15b3:0012]
Physical Slot: 0-4
Flags: bus master, fast devsel, latency 0
Memory at fba00000 (64-bit, prefetchable) [size=1M]
Capabilities: [60] Express Endpoint, MSI 00
Capabilities: [9c] MSI-X: Enable+ Count=12 Masked-
Capabilities: [100] Vendor Specific Information: ID=0000 Rev=0 Len=00c <?>
Kernel driver in use: mlx5_core
Kernel modules: mlx5_core
06:00.0 Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e]
Subsystem: Mellanox Technologies Device [15b3:0012]
Physical Slot: 0-5
Flags: bus master, fast devsel, latency 0
Memory at fb800000 (64-bit, prefetchable) [size=1M]
Capabilities: [60] Express Endpoint, MSI 00
Capabilities: [9c] MSI-X: Enable+ Count=12 Masked-
Capabilities: [100] Vendor Specific Information: ID=0000 Rev=0 Len=00c <?>
Kernel driver in use: mlx5_core
Kernel modules: mlx5_core
Component Versions
Please fill in the below table with the version numbers of components used.
Config Files
ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: sriovdp-config
namespace: kube-system
data:
config.json: |
{
"resourceList": [
{
"resourceName": "sriov_client_side",
"resourcePrefix": "mellanox",
"selectors": {
"vendors": ["15b3"],
"devices": ["101e"],
"drivers": ["netdevice"],
"pciAddresses": ["0000:00:05.0"]
}
},
{
"resourceName": "sriov_server_side",
"resourcePrefix": "mellanox",
"selectors": {
"vendors": ["15b3"],
"devices": ["101e"],
"drivers": ["netdevice"],
"pciAddresses": ["0000:00:06.0"]
}
}
]
}
Device pool config file location (Try '/etc/pcidp/config.json')
Multus config (Try '/etc/cni/multus/net.d')
{"cniVersion":"0.4.0","name":"ovn-kubernetes","type":"ovn-k8s-cni-overlay","ipam":{},"dns":{},"logFile":"/var/log/ovn-kubernetes/ovn-k8s-cni-overlay.log","logLevel":"4","logfile-maxsize":100,"logfile-maxbackups":5,"logfile-maxage":5}sh-4.4#
CNI config (Try '/etc/cni/net.d/')
{ "cniVersion": "0.3.1", "name": "multus-cni-network", "type": "multus", "namespaceIsolation": true, "globalNamespaces": "default,openshift-multus,openshift-sriov-network-operator", "logLevel": "verbose", "binDir": "/opt/multus/bin", "readinessindicatorfile": "/var/run/multus/cni/net.d/10-ovn-kubernetes.conf", "kubeconfig": "/etc/kubernetes/cni/net.d/multus.d/multus.kubeconfig", "delegates": [ {"cniVersion":"0.4.0","name":"ovn-kubernetes","type":"ovn-k8s-cni-overlay","ipam":{},"dns":{},"logFile":"/var/log/ovn-kubernetes/ovn-k8s-cni-overlay.log","logLevel":"4","logfile-maxsize":100,"logfile-maxbackups":5,"logfile-maxage":5} ] }
Kubernetes deployment type ( Bare Metal, Kubeadm etc.)
Openshift 4.12.42
Kubeconfig file
SR-IOV Network Custom Resource Definition
Logs
SR-IOV Network Device Plugin Logs (use
kubectl logs $PODNAME
)I0710 11:28:06.727499 1 manager.go:57] Using Kubelet Plugin Registry Mode
I0710 11:28:06.727846 1 main.go:46] resource manager reading configs
I0710 11:28:06.727909 1 manager.go:86] raw ResourceList: {
"resourceList": [
{
"resourceName": "sriov_client_side",
"resourcePrefix": "mellanox",
"selectors": {
"vendors": ["15b3"],
"devices": ["101e"],
"drivers": ["netdevice"],
"pciAddresses": ["0000:00:05.0"]
}
},
{
"resourceName": "sriov_server_side",
"resourcePrefix": "mellanox",
"selectors": {
"vendors": ["15b3"],
"devices": ["101e"],
"drivers": ["netdevice"],
"pciAddresses": ["0000:00:06.0"]
}
}
]
}
I0710 11:28:06.728042 1 factory.go:211] *types.NetDeviceSelectors for resource sriov_client_side is [0xc00042a900]
I0710 11:28:06.728085 1 factory.go:211] *types.NetDeviceSelectors for resource sriov_server_side is [0xc00042ac60]
I0710 11:28:06.728092 1 manager.go:106] unmarshalled ResourceList: [{ResourcePrefix:mellanox ResourceName:sriov_client_side DeviceType:netDevice ExcludeTopology:false Selectors:0xc000400eb8 AdditionalInfo:map[] SelectorObjs:[0xc00042a900]} {ResourcePrefix:mellanox ResourceName:sriov_server_side DeviceType:netDevice ExcludeTopology:false Selectors:0xc000400ed0 AdditionalInfo:map[] SelectorObjs:[0xc00042ac60]}]
I0710 11:28:06.728152 1 manager.go:217] validating resource name "mellanox/sriov_client_side"
I0710 11:28:06.728203 1 manager.go:217] validating resource name "mellanox/sriov_server_side"
I0710 11:28:06.728210 1 main.go:62] Discovering host devices
I0710 11:28:06.845790 1 auxNetDeviceProvider.go:84] auxnetdevice AddTargetDevices(): device found: 0000:03:00.0 02 Red Hat, Inc. Virtio 1.0 network device
I0710 11:28:06.845883 1 auxNetDeviceProvider.go:84] auxnetdevice AddTargetDevices(): device found: 0000:05:00.0 02 Mellanox Technolo... ConnectX Family mlx5Gen Virtual Function
I0710 11:28:06.845894 1 auxNetDeviceProvider.go:84] auxnetdevice AddTargetDevices(): device found: 0000:06:00.0 02 Mellanox Technolo... ConnectX Family mlx5Gen Virtual Function
I0710 11:28:06.845901 1 netDeviceProvider.go:67] netdevice AddTargetDevices(): device found: 0000:03:00.0 02 Red Hat, Inc. Virtio 1.0 network device
I0710 11:28:06.845942 1 netDeviceProvider.go:67] netdevice AddTargetDevices(): device found: 0000:05:00.0 02 Mellanox Technolo... ConnectX Family mlx5Gen Virtual Function
I0710 11:28:06.846313 1 netDeviceProvider.go:67] netdevice AddTargetDevices(): device found: 0000:06:00.0 02 Mellanox Technolo... ConnectX Family mlx5Gen Virtual Function
I0710 11:28:06.846510 1 main.go:68] Initializing resource servers
I0710 11:28:06.846526 1 manager.go:117] number of config: 2
I0710 11:28:06.846544 1 manager.go:121] Creating new ResourcePool: sriov_client_side
I0710 11:28:06.846548 1 manager.go:122] DeviceType: netDevice
I0710 11:28:06.847037 1 manager.go:138] initServers(): selector index 0 will register 0 devices
I0710 11:28:06.847055 1 manager.go:142] no devices in device pool, skipping creating resource server for sriov_client_side
I0710 11:28:06.847061 1 manager.go:121] Creating new ResourcePool: sriov_server_side
I0710 11:28:06.847066 1 manager.go:122] DeviceType: netDevice
I0710 11:28:06.847495 1 manager.go:138] initServers(): selector index 0 will register 0 devices
I0710 11:28:06.847512 1 manager.go:142] no devices in device pool, skipping creating resource server for sriov_server_side
I0710 11:28:06.847518 1 main.go:74] Starting all servers...
I0710 11:28:06.847523 1 main.go:79] All servers started.
I0710 11:28:06.847529 1 main.go:80] Listening for term signals
Multus logs (If enabled. Try '/var/log/multus.log' )
2024-07-10T11:04:25+00:00 [cnibincopy] Successfully moved files in /host/opt/cni/bin/upgrade_f3bb1262-de44-46c1-8d11-2b04b60ac649 to /host/opt/cni/bin/
2024-07-10T11:04:25+00:00 WARN: {unknown parameter "-"}
2024-07-10T11:04:25+00:00 Entrypoint skipped copying Multus binary.
2024-07-10T11:04:25+00:00 Generating Multus configuration file using files in /host/var/run/multus/cni/net.d...
2024-07-10T11:04:25+00:00 Attempting to find master plugin configuration, attempt 0
2024-07-10T11:04:29+00:00 Using MASTER_PLUGIN: 10-ovn-kubernetes.conf
2024-07-10T11:04:29+00:00 Nested capabilities string:
2024-07-10T11:04:29+00:00 Using /host/var/run/multus/cni/net.d/10-ovn-kubernetes.conf as a source to generate the Multus configuration
2024-07-10T11:04:29+00:00 Config file created @ /host/etc/cni/net.d/00-multus.conf
{ "cniVersion": "0.3.1", "name": "multus-cni-network", "type": "multus", "namespaceIsolation": true, "globalNamespaces": "default,openshift-multus,openshift-sriov-network-operator", "logLevel": "verbose", "binDir": "/opt/multus/bin", "readinessindicatorfile": "/var/run/multus/cni/net.d/10-ovn-kubernetes.conf", "kubeconfig": "/etc/kubernetes/cni/net.d/multus.d/multus.kubeconfig", "delegates": [ {"cniVersion":"0.4.0","name":"ovn-kubernetes","type":"ovn-k8s-cni-overlay","ipam":{},"dns":{},"logFile":"/var/log/ovn-kubernetes/ovn-k8s-cni-overlay.log","logLevel":"4","logfile-maxsize":100,"logfile-maxbackups":5,"logfile-maxage":5} ] }
2024-07-10T11:04:29+00:00 Entering watch loop...
Kubelet logs (journalctl -u kubelet)
The text was updated successfully, but these errors were encountered: