You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The CPU memory registration (MR) operation in the middle of two GPU MR operations gets slower by 20 times. If target_list is mapped together with CPU send/recv buffers, it takes about 60us. With GPU buffers, it takes 1.4ms. target_list is always mapped to CPU memory, and the difference is too large.
By using the registration cache, this would not be an issue for our system. But I am documenting the issue for other people who might see similar issues. I also have curiosity on why this happens, but not much resource exists to look at this further.
The text was updated successfully, but these errors were encountered:
Furthermore, the mechanism of data transfer using GPUDirect is different from using cudaMemcpyAsync() (of course, cudaMemcpy() either).
GPUDirect uses a PCIe facility although general data transfer mechanisms do not use. http://docs.nvidia.com/cuda/gpudirect-rdma/
The CPU memory registration (MR) operation in the middle of two GPU MR operations gets slower by 20 times. If target_list is mapped together with CPU send/recv buffers, it takes about 60us. With GPU buffers, it takes 1.4ms. target_list is always mapped to CPU memory, and the difference is too large.
By using the registration cache, this would not be an issue for our system. But I am documenting the issue for other people who might see similar issues. I also have curiosity on why this happens, but not much resource exists to look at this further.
The text was updated successfully, but these errors were encountered: