Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mystery with memory registration #1

Open
sangmank opened this issue May 6, 2015 · 2 comments
Open

Mystery with memory registration #1

sangmank opened this issue May 6, 2015 · 2 comments

Comments

@sangmank
Copy link
Contributor

sangmank commented May 6, 2015

The CPU memory registration (MR) operation in the middle of two GPU MR operations gets slower by 20 times. If target_list is mapped together with CPU send/recv buffers, it takes about 60us. With GPU buffers, it takes 1.4ms. target_list is always mapped to CPU memory, and the difference is too large.

By using the registration cache, this would not be an issue for our system. But I am documenting the issue for other people who might see similar issues. I also have curiosity on why this happens, but not much resource exists to look at this further.

@sangmank
Copy link
Contributor Author

sangmank commented May 6, 2015

Even with registration cache, the timing says the MR of target_list takes about 1.4ms. That becomes the major overhead for the connections. :(

@rmatsumiya
Copy link

Hi,
GPUDirect is very effective but limited.

First, data transfer using GPUDirect from a GPU to a NIC is much slower than from a NIC to a GPU.
This problem is recognized by NVIDIA, but it seems not to be fixed completely.
http://devblogs.nvidia.com/parallelforall/benchmarking-gpudirect-rdma-on-modern-server-platforms/
https://github.com/NVIDIA/gdrcopy
Therefore, some systems using GPUDirect use GPUDirect only from a NIC to a GPU (e.g. MVAPICH [Potluri et al. ICPP '13]).

Furthermore, the mechanism of data transfer using GPUDirect is different from using cudaMemcpyAsync() (of course, cudaMemcpy() either).
GPUDirect uses a PCIe facility although general data transfer mechanisms do not use.
http://docs.nvidia.com/cuda/gpudirect-rdma/

I think these limitation caused your problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants