We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我在 RTX 4090 工作站上运行 banchmark 程序,取得了异常高的总线带宽数据:
Pytorch version : 1.14.0a0+44dac51 CUDA version : 12.0 GPU : NVIDIA GeForce RTX 4090 Matrix Multiplication: n=128 n=512 n=2048 n=8192 torch.float32 1.048 29.653 82.788 86.676 torch.float16 1.304 46.890 167.112 158.596 Memory Bandwidth: 65536 262144 1048576 4194304 TFLOPS 0.025 0.099 0.324 0.484 GB/s 196.343 792.595 2590.594 3868.374
可以看到显存带宽为 3868 GB/s,而我查到的 4090 理论显存带宽为 1000 GB/s 左右。
而我在 A800 服务器上运行 banchmark 程序的结果是正常的:
Pytorch version : 2.0.0a0+1767026 CUDA version : 12.1 GPU : NVIDIA A800 80GB PCIe Matrix Multiplication: n=128 n=512 n=2048 n=8192 torch.float32 0.464 25.947 82.386 105.973 torch.float16 0.343 31.456 192.540 215.333 Memory Bandwidth: 65536 262144 1048576 4194304 TFLOPS 0.009 0.036 0.143 0.216 GB/s 72.026 288.159 1143.973 1727.486
显存带宽 1727 GB/s 低于理论上限 1935 GB/s
这导致 4090 的显存带宽远高于 A800,在我的实际训练中 4090也取得了更快的训练速度。 请问 4090 这样高的带宽是正常的吗?如果不正常的话有什么可能的原因?
The text was updated successfully, but these errors were encountered:
我遇到了相同的问题,4090ADOC
Pytorch version : 2.0.1 CUDA version : 11.8 GPU : NVIDIA GeForce RTX 4090 Matrix Multiplication: n=128 n=512 n=2048 n=8192 torch.float32 0.251 22.135 51.615 50.243 torch.float16 0.278 33.785 158.224 163.479 Memory Bandwidth: 65536 262144 1048576 4194304 TFLOPS 0.004 0.041 0.165 0.390 GB/s 30.48 327.36 1320.35 3117.19
Sorry, something went wrong.
Same question:
Pytorch version : 2.3.1+cu121 CUDA version : 12.1 GPU : NVIDIA GeForce RTX 4090 Matrix Multiplication: n=128 n=512 n=2048 n=8192 torch.float32 0.271 16.075 48.697 52.251 torch.float16 0.255 16.687 165.275 168.725 Memory Bandwidth: 65536 262144 1048576 4194304 8388608 16777216 TFLOPS 0.006 0.023 0.091 0.375 0.495 0.115 GB/s 46.407 181.009 731.115 2996.519 3956.074 919.237
No branches or pull requests
我在 RTX 4090 工作站上运行 banchmark 程序,取得了异常高的总线带宽数据:
可以看到显存带宽为 3868 GB/s,而我查到的 4090 理论显存带宽为 1000 GB/s 左右。
而我在 A800 服务器上运行 banchmark 程序的结果是正常的:
显存带宽 1727 GB/s 低于理论上限 1935 GB/s
这导致 4090 的显存带宽远高于 A800,在我的实际训练中 4090也取得了更快的训练速度。
请问 4090 这样高的带宽是正常的吗?如果不正常的话有什么可能的原因?
The text was updated successfully, but these errors were encountered: