Q about CUDA API:cudaHostAlloc #373

kaixiangjin · 2025-01-10T03:57:14Z

kaixiangjin
Jan 10, 2025

I try to use cudaHostAlloc to allocate the space for GPU and CPU. Does anyone know the difference between cudaHostAllocMapped and cudaHostAllocDefault? The documentation said that cudaHostAllocDefault need to use cudamemcpy to copy data from host to device.But I did not use cudamemcpy and cuda still can compute and get the right result. So what is the reason. Thank u.
eg.
cv::Mat im = cv::imread(...);
cudaHostAlloc((void**)&dst,6006003sizeof(unsigned char),cudaHostAllocDefault);
cudaHostAlloc((void**)&src,6006003sizeof(unsigned char),cudaHostAllocDefault);
memcpy(src,im.data,6006003);
mykernal(src,dst);

and transfer the dst to cv::Mat result. and we get the right result.

Answered by leofang

Jan 12, 2025

cudaHostAllocMapped is automatically set if you only use CUDA runtime APIs (or CUDA driver APIs with explicit use of the primary context), since the primary context has this flag set by default on devices supporting address mapping.

Now, with address mapping CUDA kernels can directly access host pinned memory without extra copy to the device memory, which explains what you saw. This is possible because, since the memory is pinned/page-locked, the OS guarantees that during the lifetime of the memory there's no page swapping happening while the kernel is doing its work, and so reading/writing this memory from device is safe.

There are a few potential issues (or benefits, depending on your p…

View full answer

leofang · 2025-01-12T02:31:13Z

leofang
Jan 12, 2025
Maintainer

cudaHostAllocMapped is automatically set if you only use CUDA runtime APIs (or CUDA driver APIs with explicit use of the primary context), since the primary context has this flag set by default on devices supporting address mapping.

Now, with address mapping CUDA kernels can directly access host pinned memory without extra copy to the device memory, which explains what you saw. This is possible because, since the memory is pinned/page-locked, the OS guarantees that during the lifetime of the memory there's no page swapping happening while the kernel is doing its work, and so reading/writing this memory from device is safe.

There are a few potential issues (or benefits, depending on your problem needs):

Directly acccessing pinned memory is through the PCI-E bus, so the memory bandwidth is much lower than accessing the device memory and your kernel might run slower compared to with an extra copy, depending on your problem size
Since device kernels are asynchronous, if the same memory also needs to be accessed by host CPU, a proper stream ordering / synchronization needs to be done by the user

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q about CUDA API:cudaHostAlloc #373

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Q about CUDA API:cudaHostAlloc #373

kaixiangjin Jan 10, 2025

Replies: 1 comment

leofang Jan 12, 2025 Maintainer

kaixiangjin
Jan 10, 2025

leofang
Jan 12, 2025
Maintainer