Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ngc_rdma_test.sh failed on IB_WRITE_BW stage #108

Open
vtlrazin opened this issue Apr 9, 2024 · 1 comment
Open

ngc_rdma_test.sh failed on IB_WRITE_BW stage #108

vtlrazin opened this issue Apr 9, 2024 · 1 comment

Comments

@vtlrazin
Copy link

vtlrazin commented Apr 9, 2024

ngc_rdma_test.sh clx-host-109 mlx5_3,mlx5_4 clx-host-108 mlx5_3,mlx5_4

INFO: Each device can use up to 28 cores (may include core 0)
INFO: Each device can use up to 28 cores (may include core 0)
INFO: run ib_write_bw server on clx-host-108: sudo taskset -c 28 ib_write_bw -d mlx5_3 -s 65536 -D 30 -p 10000 -F --report_gbit -b -q 2 --output=bandwidth
INFO: run ib_write_bw server on clx-host-108: sudo taskset -c 84 ib_write_bw -d mlx5_4 -s 65536 -D 30 -p 10001 -F --report_gbit -b -q 2 --output=bandwidth
INFO: run ib_write_bw client on clx-host-109: sudo taskset -c 28 ib_write_bw -d mlx5_3 -D 30 clx-host-108 -s 65536 -p 10000 -F --report_gbit -b -q 2 --out_json --out_json_file=/tmp/perftest_mlx5_3.json &
INFO: run ib_write_bw client on clx-host-109: sudo taskset -c 84 ib_write_bw -d mlx5_4 -D 30 clx-host-108 -s 65536 -p 10001 -F --report_gbit -b -q 2 --out_json --out_json_file=/tmp/perftest_mlx5_4.json &
WARNING: BW peak won't be measured in this run.

                RDMA_Write Bidirectional BW Test

Dual-port : OFF Device : mlx5_3
Number of qps : 2 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
TX depth : 128
CQ Moderation : 1
Mtu : 4096[B]
Link type : Ethernet
GID index : 3
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet

local address: LID 0000 QPN 0x01c2 PSN 0x50e20e RKey 0x0060bd VAddr 0x0075252bfaa000
GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:101:02
local address: LID 0000 QPN 0x01c3 PSN 0x383de0 RKey 0x0060bd VAddr 0x0075252bfba000
GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:101:02
remote address: LID 0000 QPN 0x02a3 PSN 0x6a0372 RKey 0x0060bd VAddr 0x00701124590000
GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:101:01
remote address: LID 0000 QPN 0x02a4 PSN 0x1b4d74 RKey 0x0060bd VAddr 0x007011245a0000
GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:101:01

#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]
Completion with error at client
Failed status 12: wr_id 1 syndrom 0x81
scnt=256, ccnt=0
Failed to complete run_iter_bw function successfully
Completion with error at client
Failed status 12: wr_id 1 syndrom 0x81
scnt=256, ccnt=0
Failed to complete run_iter_bw function successfully
WARNING: BW peak won't be measured in this run.

                RDMA_Write Bidirectional BW Test

Dual-port : OFF Device : mlx5_4
Number of qps : 2 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
TX depth : 128
CQ Moderation : 1
Mtu : 4096[B]
Link type : Ethernet
GID index : 3
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet

local address: LID 0000 QPN 0x02c2 PSN 0x497b66 RKey 0x0420bd VAddr 0x0077839e8d9000
GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:102:02
local address: LID 0000 QPN 0x02c3 PSN 0xf69858 RKey 0x0420bd VAddr 0x0077839e8e9000
GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:102:02
remote address: LID 0000 QPN 0x03a3 PSN 0x7a7406 RKey 0x0420bd VAddr 0x007ff8e41df000
GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:102:01
remote address: LID 0000 QPN 0x03a4 PSN 0xb03478 RKey 0x0420bd VAddr 0x007ff8e41ef000
GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:102:01

#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]
Completion with error at client
Failed status 12: wr_id 1 syndrom 0x81
scnt=256, ccnt=0

Failed to complete run_iter_bw function successfully
Completion with error at client
Failed status 12: wr_id 1 syndrom 0x81
scnt=256, ccnt=0

Failed to complete run_iter_bw function successfully
awk: fatal: cannot open file /tmp/perftest_mlx5_3.json' for reading: No such file or directory Device mlx5_3 reached Gb/s (max possible: 400 Gb/s) Device mlx5_3 didn't reach pass bw rate of 360 Gb/s awk: fatal: cannot open file /tmp/perftest_mlx5_4.json' for reading: No such file or directory
Device mlx5_4 reached Gb/s (max possible: 400 Gb/s)
Device mlx5_4 didn't reach pass bw rate of 360 Gb/s
ib_write_bw - Failed for devices: mlx5_3 mlx5_4 <-> mlx5_3 mlx5_4

@blochl
Copy link
Collaborator

blochl commented Apr 10, 2024

@vtlrazin, please format the output properly, as a code block. It's unreadable like you put it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants