Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mellanox SRIOV demo pod cannot be created #66

Open
jason-gideon opened this issue Nov 22, 2022 · 3 comments
Open

mellanox SRIOV demo pod cannot be created #66

jason-gideon opened this issue Nov 22, 2022 · 3 comments

Comments

@jason-gideon
Copy link

jason-gideon commented Nov 22, 2022

I tried to create a pod with SRIOV net device (e.g. Mellanox IB), but the pod stuck in ContainerCreating. I configured 4 VFs on the IB interface of the host. I run device plugin pod and Multus CNI meta-plugin. but the SRIOV demo pod show ERROR

multus

./multus-daemonset-thick-plugin.yml:125: image: ghcr.io/k8snetworkplumbingwg/multus-cni:v3.9.2-thick-amd64

ERROR

n-MacBookPro:~/20-k8s-rdma-sriov/ib-sriov-cni/deployment/examples$ kubectl describe po my-test-pod-fnjk7
Name:         my-test-pod-fnjk7
Namespace:    default
Priority:     0
Node:         s-113-2-35/10.113.2.35
Start Time:   Tue, 22 Nov 2022 20:22:33 +0800
Labels:       <none>
Annotations:  cni.projectcalico.org/containerID: 848157aeb2b3549aa8e2fce419c8353989ecb98ad62b1c6513f46423492f6cfd
              cni.projectcalico.org/podIP:
              cni.projectcalico.org/podIPs:
              k8s.v1.cni.cncf.io/networks: [{"name": "ib-sriov-network"}]
Status:       Pending
IP:
IPs:          <none>
Containers:
  my-test-ctr:
    Container ID:
    Image:         mellanox/rping-test
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      sleep 1000000

    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      mellanox.com/mlnx_sriov_rdma_ib:  1
    Requests:
      mellanox.com/mlnx_sriov_rdma_ib:  1
    Environment:                        <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2clfq (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kube-api-access-2clfq:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                From               Message
  ----     ------                  ----               ----               -------
  Normal   Scheduled               21s                default-scheduler  Successfully assigned default/my-test-pod-fnjk7 to s-113-2-35
  Normal   AddedInterface          21s                multus             Add eth0 [10.42.0.21/32] from k8s-pod-network
  Warning  FailedCreatePodSandBox  21s                kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "ef4b067661534edfacd217cb1ea3cb1b2cdd44f65ffc1067a59091a2ae6490be" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to set up pod "my-test-pod-fnjk7_default" network: [default/my-test-pod-fnjk7/:sriov-network]: error adding container to network "sriov-network": infiniBand SRI-OV CNI failed to configure VF "VF ib4 GUID is not valid", failed to clean up sandbox container "ef4b067661534edfacd217cb1ea3cb1b2cdd44f65ffc1067a59091a2ae6490be" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to teardown pod "my-test-pod-fnjk7_default" network: delegateDel: error invoking DelegateDel - "ib-sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/ib-sriov with name ef4b067661534edfacd217cb1ea3cb1b2cdd44f65ffc1067a59091a2ae6490be-net1]
  Normal   AddedInterface          20s                multus             Add eth0 [10.42.0.22/32] from k8s-pod-network
  Normal   AddedInterface          19s                multus             Add eth0 [10.42.0.23/32] from k8s-pod-network
  Warning  FailedCreatePodSandBox  19s                kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "3573ba2407bcf6bacb171e5e8b32980ff549a59de1bd8b119d89f6304ae69b7c" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to set up pod "my-test-pod-fnjk7_default" network: [default/my-test-pod-fnjk7/:sriov-network]: error adding container to network "sriov-network": infiniBand SRI-OV CNI failed to configure VF "VF ib4 GUID is not valid", failed to clean up sandbox container "3573ba2407bcf6bacb171e5e8b32980ff549a59de1bd8b119d89f6304ae69b7c" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to teardown pod "my-test-pod-fnjk7_default" network: delegateDel: error invoking DelegateDel - "ib-sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/ib-sriov with name 3573ba2407bcf6bacb171e5e8b32980ff549a59de1bd8b119d89f6304ae69b7c-net1]
  Warning  FailedCreatePodSandBox  18s                kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "0cdbf8cb322a3156d88f04a52c2bea0fc51511ffa6d21b4db9aa4ae44dc858e2" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to set up pod "my-test-pod-fnjk7_default" network: [default/my-test-pod-fnjk7/:sriov-network]: error adding container to network "sriov-network": infiniBand SRI-OV CNI failed to configure VF "VF ib4 GUID is not valid", failed to clean up sandbox container "0cdbf8cb322a3156d88f04a52c2bea0fc51511ffa6d21b4db9aa4ae44dc858e2" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to teardown pod "my-test-pod-fnjk7_default" network: delegateDel: error invoking DelegateDel - "ib-sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/ib-sriov with name 0cdbf8cb322a3156d88f04a52c2bea0fc51511ffa6d21b4db9aa4ae44dc858e2-net1]
  Normal   AddedInterface          18s                multus             Add eth0 [10.42.0.24/32] from k8s-pod-network
  Warning  FailedCreatePodSandBox  17s                kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "93bbd85125dc93d15558f34aa2693d13781db6d38905925814151160ef405dc9" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to set up pod "my-test-pod-fnjk7_default" network: [default/my-test-pod-fnjk7/:sriov-network]: error adding container to network "sriov-network": infiniBand SRI-OV CNI failed to configure VF "VF ib4 GUID is not valid", failed to clean up sandbox container "93bbd85125dc93d15558f34aa2693d13781db6d38905925814151160ef405dc9" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to teardown pod "my-test-pod-fnjk7_default" network: delegateDel: error invoking DelegateDel - "ib-sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/ib-sriov with name 93bbd85125dc93d15558f34aa2693d13781db6d38905925814151160ef405dc9-net1]
  Normal   AddedInterface          17s                multus             Add eth0 [10.42.0.25/32] from k8s-pod-network
  Warning  FailedCreatePodSandBox  16s                kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "8ea7b9cda5014ae0e8a3f335903e83c542156c4ec8de84c80a627ef3c3473cb1" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to set up pod "my-test-pod-fnjk7_default" network: [default/my-test-pod-fnjk7/:sriov-network]: error adding container to network "sriov-network": infiniBand SRI-OV CNI failed to configure VF "VF ib4 GUID is not valid", failed to clean up sandbox container "8ea7b9cda5014ae0e8a3f335903e83c542156c4ec8de84c80a627ef3c3473cb1" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to teardown pod "my-test-pod-fnjk7_default" network: delegateDel: error invoking DelegateDel - "ib-sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/ib-sriov with name 8ea7b9cda5014ae0e8a3f335903e83c542156c4ec8de84c80a627ef3c3473cb1-net1]
  Normal   AddedInterface          16s                multus             Add eth0 [10.42.0.26/32] from k8s-pod-network
  Warning  FailedCreatePodSandBox  15s                kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "922f59df03433b78b31201f685867ac475fcb96c5b4791eecd642fe87b5ae365" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to set up pod "my-test-pod-fnjk7_default" network: [default/my-test-pod-fnjk7/:sriov-network]: error adding container to network "sriov-network": infiniBand SRI-OV CNI failed to configure VF "VF ib4 GUID is not valid", failed to clean up sandbox container "922f59df03433b78b31201f685867ac475fcb96c5b4791eecd642fe87b5ae365" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to teardown pod "my-test-pod-fnjk7_default" network: delegateDel: error invoking DelegateDel - "ib-sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/ib-sriov with name 922f59df03433b78b31201f685867ac475fcb96c5b4791eecd642fe87b5ae365-net1]
  Normal   AddedInterface          15s                multus             Add eth0 [10.42.0.27/32] from k8s-pod-network
  Normal   AddedInterface          14s                multus             Add eth0 [10.42.0.28/32] from k8s-pod-network
  Warning  FailedCreatePodSandBox  14s                kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "68c5c26e73706571b562dfa035e6b53e848f7cc18c85b8a3995f0a2a3c338b97" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to set up pod "my-test-pod-fnjk7_default" network: [default/my-test-pod-fnjk7/:sriov-network]: error adding container to network "sriov-network": infiniBand SRI-OV CNI failed to configure VF "VF ib4 GUID is not valid", failed to clean up sandbox container "68c5c26e73706571b562dfa035e6b53e848f7cc18c85b8a3995f0a2a3c338b97" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to teardown pod "my-test-pod-fnjk7_default" network: delegateDel: error invoking DelegateDel - "ib-sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/ib-sriov with name 68c5c26e73706571b562dfa035e6b53e848f7cc18c85b8a3995f0a2a3c338b97-net1]
  Warning  FailedCreatePodSandBox  13s                kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "60fda0e94bf41698460e2406a00d6443299a9b176da7ed8004f39adfc2bb16e0" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to set up pod "my-test-pod-fnjk7_default" network: [default/my-test-pod-fnjk7/:sriov-network]: error adding container to network "sriov-network": infiniBand SRI-OV CNI failed to configure VF "VF ib4 GUID is not valid", failed to clean up sandbox container "60fda0e94bf41698460e2406a00d6443299a9b176da7ed8004f39adfc2bb16e0" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to teardown pod "my-test-pod-fnjk7_default" network: delegateDel: error invoking DelegateDel - "ib-sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/ib-sriov with name 60fda0e94bf41698460e2406a00d6443299a9b176da7ed8004f39adfc2bb16e0-net1]
  Normal   AddedInterface          12s                multus             Add eth0 [10.42.0.29/32] from k8s-pod-network
  Warning  FailedCreatePodSandBox  12s                kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "777d1178ca6d8681b1f0f43780fb357c0dce74a6905c94337c2f07ef9a5c9c36" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to set up pod "my-test-pod-fnjk7_default" network: [default/my-test-pod-fnjk7/:sriov-network]: error adding container to network "sriov-network": infiniBand SRI-OV CNI failed to configure VF "VF ib4 GUID is not valid", failed to clean up sandbox container "777d1178ca6d8681b1f0f43780fb357c0dce74a6905c94337c2f07ef9a5c9c36" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to teardown pod "my-test-pod-fnjk7_default" network: delegateDel: error invoking DelegateDel - "ib-sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/ib-sriov with name 777d1178ca6d8681b1f0f43780fb357c0dce74a6905c94337c2f07ef9a5c9c36-net1]
  Normal   AddedInterface          11s                multus             Add eth0 [10.42.0.30/32] from k8s-pod-network

The device plugin can detect the SRIOV net device on the host (node s-113-2-35 in my experiment), the output is shown in the following:

-MacBookPro:~/20-k8s-rdma-sriov/multus-cni/deployments$ kubectl get node s-113-2-35 -o json | jq '.status.allocatable'
{
  "cpu": "128",
  "ephemeral-storage": "5169411933432",
  "hugepages-1Gi": "0",
  "hugepages-2Mi": "0",
  "mellanox.com/mlnx_sriov_rdma_ib": "4",
  "memory": "528110968Ki",
  "pods": "110"
}

NAD

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: ib-sriov-network
  annotations:
    k8s.v1.cni.cncf.io/resourceName: mellanox.com/mlnx_sriov_rdma_ib
spec:
  config: '{
  "type": "ib-sriov",
  "cniVersion": "0.3.1",
  "name": "sriov-network",
  "ipam": {
    "type": "host-local",
    "subnet": "192.168.217.0/24",
    "routes": [{
      "dst": "0.0.0.0/0"
    }],
    "gateway": "192.168.217.1"
  }
}'

mutlus configmap

apiVersion: v1
kind: ConfigMap
metadata:
  name: sriovdp-config
  namespace: kube-system
data:
  config.json: |
    {
        "resourceList": [{
                "resourcePrefix": "mellanox.com",
                "resourceName": "mlnx_sriov_rdma_ib",
                "selectors": {
                    "isRdma": true,
                    "vendors": ["15b3"],
                    "devices": ["101c"],
                    "drivers": ["mlx5_core"]
                }
            }
        ]
    }

sriov device plugin

n-MacBookPro:~/20-k8s-rdma-sriov/multus-cni/deployments$ kubectl -n kube-system logs kube-sriov-device-plugin-amd64-bpwlk
I1122 11:59:59.507695       1 manager.go:51] Using Kubelet Plugin Registry Mode
I1122 11:59:59.508691       1 main.go:44] resource manager reading configs
I1122 11:59:59.508739       1 manager.go:79] raw ResourceList: {
    "resourceList": [{
            "resourcePrefix": "mellanox.com",
            "resourceName": "mlnx_sriov_rdma_ib",
            "selectors": {
                "isRdma": true,
                "vendors": ["15b3"],
                "devices": ["101c"],
                "drivers": ["mlx5_core"]
            }
        }
    ]
}
I1122 11:59:59.508875       1 factory.go:166] net device selector for resource mlnx_sriov_rdma_ib is &{DeviceSelectors:{Vendors:[15b3] Devices:[101c] Drivers:[mlx5_core] PciAddresses:[]} PfNames:[] RootDevices:[] LinkTypes:[] DDPProfiles:[] IsRdma:true NeedVhostNet:false}
I1122 11:59:59.508902       1 manager.go:99] unmarshalled ResourceList: [{ResourcePrefix:mellanox.com ResourceName:mlnx_sriov_rdma_ib DeviceType:netDevice Selectors:0xc00000cd38 SelectorObj:0xc000375380}]
I1122 11:59:59.508960       1 manager.go:200] validating resource name "mellanox.com/mlnx_sriov_rdma_ib"
I1122 11:59:59.508968       1 main.go:60] Discovering host devices
I1122 11:59:59.589424       1 netDeviceProvider.go:84] netdevice AddTargetDevices(): device found: 0000:c2:00.0 02              Intel Corporation    Ethernet Controller X710 for 10GbE SFP+
I1122 11:59:59.589938       1 netDeviceProvider.go:84] netdevice AddTargetDevices(): device found: 0000:c2:00.1 02              Intel Corporation    Ethernet Controller X710 for 10GbE SFP+
I1122 11:59:59.590256       1 netDeviceProvider.go:84] netdevice AddTargetDevices(): device found: 0000:c3:00.0 02              Mellanox Technolo... MT28908 Family [ConnectX-6]
I1122 11:59:59.591462       1 netDeviceProvider.go:84] netdevice AddTargetDevices(): device found: 0000:c3:00.1 02              Mellanox Technolo... MT28908 Family [ConnectX-6]
I1122 11:59:59.591704       1 netDeviceProvider.go:84] netdevice AddTargetDevices(): device found: 0000:c3:00.2 02              Mellanox Technolo... MT28908 Family [ConnectX-6 Virtual Fu...
I1122 11:59:59.591894       1 netDeviceProvider.go:84] netdevice AddTargetDevices(): device found: 0000:c3:00.3 02              Mellanox Technolo... MT28908 Family [ConnectX-6 Virtual Fu...
I1122 11:59:59.592053       1 netDeviceProvider.go:84] netdevice AddTargetDevices(): device found: 0000:c3:00.4 02              Mellanox Technolo... MT28908 Family [ConnectX-6 Virtual Fu...
I1122 11:59:59.592203       1 netDeviceProvider.go:84] netdevice AddTargetDevices(): device found: 0000:c3:00.5 02              Mellanox Technolo... MT28908 Family [ConnectX-6 Virtual Fu...
I1122 11:59:59.592383       1 accelDeviceProvider.go:82] accelerator AddTargetDevices(): device found: 0000:01:00.0     12              unknown              unknown
I1122 11:59:59.592392       1 accelDeviceProvider.go:82] accelerator AddTargetDevices(): device found: 0000:22:00.0     12              unknown              unknown
I1122 11:59:59.592397       1 accelDeviceProvider.go:82] accelerator AddTargetDevices(): device found: 0000:41:00.0     12              unknown              unknown
I1122 11:59:59.592403       1 accelDeviceProvider.go:82] accelerator AddTargetDevices(): device found: 0000:61:00.0     12              unknown              unknown
I1122 11:59:59.592407       1 accelDeviceProvider.go:82] accelerator AddTargetDevices(): device found: 0000:81:00.0     12              unknown              unknown
I1122 11:59:59.592412       1 accelDeviceProvider.go:82] accelerator AddTargetDevices(): device found: 0000:a1:00.0     12              unknown              unknown
I1122 11:59:59.592417       1 accelDeviceProvider.go:82] accelerator AddTargetDevices(): device found: 0000:c1:00.0     12              unknown              unknown
I1122 11:59:59.592421       1 accelDeviceProvider.go:82] accelerator AddTargetDevices(): device found: 0000:e1:00.0     12              unknown              unknown
I1122 11:59:59.592429       1 main.go:66] Initializing resource servers
I1122 11:59:59.592731       1 manager.go:105] number of config: 1
I1122 11:59:59.592739       1 manager.go:109]
I1122 11:59:59.592742       1 manager.go:110] Creating new ResourcePool: mlnx_sriov_rdma_ib
I1122 11:59:59.592746       1 manager.go:111] DeviceType: netDevice
W1122 11:59:59.592779       1 pciNetDevice.go:55] RDMA resources for 0000:c2:00.0 not found. Are RDMA modules loaded?
I1122 11:59:59.593104       1 utils.go:71] Devlink query for eswitch mode is not supported for device 0000:c2:00.0. error getting devlink device attributes for net device 0000:c2:00.0 no such device
W1122 11:59:59.593215       1 pciNetDevice.go:55] RDMA resources for 0000:c2:00.1 not found. Are RDMA modules loaded?
I1122 11:59:59.593362       1 utils.go:71] Devlink query for eswitch mode is not supported for device 0000:c2:00.1. error getting devlink device attributes for net device 0000:c2:00.1 no such device
I1122 11:59:59.594005       1 utils.go:71] Devlink query for eswitch mode is not supported for device 0000:c3:00.1. <nil>
I1122 11:59:59.596385       1 utils.go:71] Devlink query for eswitch mode is not supported for device 0000:c3:00.2. <nil>
I1122 11:59:59.597465       1 utils.go:71] Devlink query for eswitch mode is not supported for device 0000:c3:00.3. <nil>
I1122 11:59:59.598273       1 utils.go:71] Devlink query for eswitch mode is not supported for device 0000:c3:00.4. <nil>
I1122 11:59:59.599262       1 utils.go:71] Devlink query for eswitch mode is not supported for device 0000:c3:00.5. <nil>
I1122 11:59:59.599408       1 factory.go:106] device added: [pciAddr: 0000:c3:00.2, vendor: 15b3, device: 101c, driver: mlx5_core]
I1122 11:59:59.599417       1 factory.go:106] device added: [pciAddr: 0000:c3:00.3, vendor: 15b3, device: 101c, driver: mlx5_core]
I1122 11:59:59.599423       1 factory.go:106] device added: [pciAddr: 0000:c3:00.4, vendor: 15b3, device: 101c, driver: mlx5_core]
I1122 11:59:59.599428       1 factory.go:106] device added: [pciAddr: 0000:c3:00.5, vendor: 15b3, device: 101c, driver: mlx5_core]
I1122 11:59:59.599446       1 manager.go:139] New resource server is created for mlnx_sriov_rdma_ib ResourcePool
I1122 11:59:59.599454       1 main.go:72] Starting all servers...
I1122 11:59:59.599885       1 server.go:199] starting mlnx_sriov_rdma_ib device plugin endpoint at: mellanox.com_mlnx_sriov_rdma_ib.sock
I1122 11:59:59.602783       1 server.go:226] mlnx_sriov_rdma_ib device plugin endpoint started serving
I1122 11:59:59.602805       1 main.go:77] All servers started.
I1122 11:59:59.602811       1 main.go:78] Listening for term signals
I1122 12:00:00.175755       1 server.go:110] Plugin: mellanox.com_mlnx_sriov_rdma_ib.sock gets registered successfully at Kubelet
I1122 12:00:00.175875       1 server.go:134] ListAndWatch(mlnx_sriov_rdma_ib) invoked
I1122 12:00:00.175890       1 server.go:142] ListAndWatch(mlnx_sriov_rdma_ib): send devices &ListAndWatchResponse{Devices:[]*Device{&Device{ID:0000:c3:00.4,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:1,},},},},&Device{ID:0000:c3:00.5,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:1,},},},},&Device{ID:0000:c3:00.2,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:1,},},},},&Device{ID:0000:c3:00.3,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:1,},},},},},}
I1122 12:04:42.983933       1 server.go:119] Allocate() called with &AllocateRequest{ContainerRequests:[]*ContainerAllocateRequest{&ContainerAllocateRequest{DevicesIDs:[0000:c3:00.3],},},}
I1122 12:04:42.984024       1 netResourcePool.go:51] GetDeviceSpecs(): for devices: [0000:c3:00.3]
I1122 12:04:42.984044       1 pool_stub.go:97] GetEnvs(): for devices: [0000:c3:00.3]
I1122 12:04:42.984052       1 pool_stub.go:113] GetMounts(): for devices: [0000:c3:00.3]
I1122 12:04:42.984059       1 server.go:128] AllocateResponse send: &AllocateResponse{ContainerResponses:[]*ContainerAllocateResponse{&ContainerAllocateResponse{Envs:map[string]string{PCIDEVICE_MELLANOX_COM_MLNX_SRIOV_RDMA_IB: 0000:c3:00.3,},Mounts:[]*Mount{},Devices:[]*DeviceSpec{&DeviceSpec{ContainerPath:/dev/infiniband/issm3,HostPath:/dev/infiniband/issm3,Permissions:rwm,},&DeviceSpec{ContainerPath:/dev/infiniband/umad3,HostPath:/dev/infiniband/umad3,Permissions:rwm,},&DeviceSpec{ContainerPath:/dev/infiniband/uverbs3,HostPath:/dev/infiniband/uverbs3,Permissions:rwm,},&DeviceSpec{ContainerPath:/dev/infiniband/rdma_cm,HostPath:/dev/infiniband/rdma_cm,Permissions:rwm,},},Annotations:map[string]string{},},},}
I1122 12:22:33.340229       1 server.go:119] Allocate() called with &AllocateRequest{ContainerRequests:[]*ContainerAllocateRequest{&ContainerAllocateRequest{DevicesIDs:[0000:c3:00.4],},},}
I1122 12:22:33.340326       1 netResourcePool.go:51] GetDeviceSpecs(): for devices: [0000:c3:00.4]
I1122 12:22:33.340347       1 pool_stub.go:97] GetEnvs(): for devices: [0000:c3:00.4]
I1122 12:22:33.340355       1 pool_stub.go:113] GetMounts(): for devices: [0000:c3:00.4]
I1122 12:22:33.340362       1 server.go:128] AllocateResponse send: &AllocateResponse{ContainerResponses:[]*ContainerAllocateResponse{&ContainerAllocateResponse{Envs:map[string]string{PCIDEVICE_MELLANOX_COM_MLNX_SRIOV_RDMA_IB: 0000:c3:00.4,},Mounts:[]*Mount{},Devices:[]*DeviceSpec{&DeviceSpec{ContainerPath:/dev/infiniband/issm4,HostPath:/dev/infiniband/issm4,Permissions:rwm,},&DeviceSpec{ContainerPath:/dev/infiniband/umad4,HostPath:/dev/infiniband/umad4,Permissions:rwm,},&DeviceSpec{ContainerPath:/dev/infiniband/uverbs4,HostPath:/dev/infiniband/uverbs4,Permissions:rwm,},&DeviceSpec{ContainerPath:/dev/infiniband/rdma_cm,HostPath:/dev/infiniband/rdma_cm,Permissions:rwm,},},Annotations:map[string]string{},},},}

@jason-gideon
Copy link
Author

jason-gideon commented Nov 22, 2022

I print guid , it shows guid all 00. How to fix this?

n-MacBookPro:~/20-k8s-rdma-sriov/ib-sriov-cni/deployment/examples$ kubectl describe pod my-test-pod
Name:         my-test-pod
Namespace:    default
Priority:     0
Node:         s-113-2-35/10.113.2.35
Start Time:   Tue, 22 Nov 2022 22:02:12 +0800
Labels:       <none>
Annotations:  cni.projectcalico.org/containerID: dc4a26cafbe5e8d9ab86f863ec42735061cf67593330b8cdf54eac56451f3bfd
              cni.projectcalico.org/podIP:
              cni.projectcalico.org/podIPs:
              k8s.v1.cni.cncf.io/networks: [{"name": "ib-sriov-network"}]
Status:       Pending
IP:
IPs:          <none>
Containers:
  my-test-ctr:
    Container ID:
    Image:         mellanox/rping-test
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      sleep 1000000

    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      mellanox.com/mlnx_sriov_rdma_ib:  1
    Requests:
      mellanox.com/mlnx_sriov_rdma_ib:  1
    Environment:                        <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jw2sr (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kube-api-access-jw2sr:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age        From               Message
  ----     ------                  ----       ----               -------
  Normal   Scheduled               <invalid>  default-scheduler  Successfully assigned default/my-test-pod to s-113-2-35
  Normal   AddedInterface          <invalid>  multus             Add eth0 [10.42.0.219/32] from k8s-pod-network
  Warning  FailedCreatePodSandBox  <invalid>  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "dc4a26cafbe5e8d9ab86f863ec42735061cf67593330b8cdf54eac56451f3bfd" network for pod "my-test-pod": networkPlugin cni failed to set up pod "my-test-pod_default" network: [default/my-test-pod/:sriov-network]: error adding container to network "sriov-network": infiniBand SRI-OV CNI failed to configure VF "VF ib2 GUID is not valid, HardwareAddr:00:00:00:e7:fe:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00, guid:00:00:00:00:00:00:00:00", failed to clean up sandbox container "dc4a26cafbe5e8d9ab86f863ec42735061cf67593330b8cdf54eac56451f3bfd" network for pod "my-test-pod": networkPlugin cni failed to teardown pod "my-test-pod_default" network: delegateDel: error invoking DelegateDel - "ib-sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/ib-sriov with name dc4a26cafbe5e8d9ab86f863ec42735061cf67593330b8cdf54eac56451f3bfd-net1]
  Normal   SandboxChanged          <invalid>  kubelet            Pod sandbox changed, it will be killed and re-created.

@zhutong196
Copy link

I meet the same question; you need first config vf node GUID and port GUID, Then use the command ibdev2netdev -v to check and display VF of status is up, and then you can use vf normally
image

@cyclinder
Copy link

Hey @zhutong196, Could you tell me how to configure the vf node GUID and port GUID?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants