Kernel launch parameters for `thrust::for_each` (and other algorithms) #1052

mz-zoox · 2021-11-12T18:59:52Z

mz-zoox
Nov 12, 2021

Is there a way to customize the kernel launch parameter for thrust algorithms? thrust::for_each always launches 512 CUDA threads per block. I am wondering if it something user can customize for performance tuning?

Also related to launch parameters, but possible a new topic entirely. Is it possible to use shared memory in functor passed to thrust::for_each and if so, is dynamic shared memory possible and how to specify the size?

alliepiper · 2021-11-15T16:34:15Z

alliepiper
Nov 15, 2021
Collaborator

Thrust does not expose block size or shmem allocations to users, these are always hardcoded for each algorithm/architecture. You'll need to write a custom CUDA kernel outside of Thrust if you want this level of control.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel launch parameters for `thrust::for_each` (and other algorithms) #1052

{{title}}

Replies: 0 comments 1 reply

{{title}}

Select a reply

Kernel launch parameters for thrust::for_each (and other algorithms) #1052

mz-zoox Nov 12, 2021

Replies: 0 comments · 1 reply

alliepiper Nov 15, 2021 Collaborator

Kernel launch parameters for `thrust::for_each` (and other algorithms) #1052

mz-zoox
Nov 12, 2021

Replies: 0 comments 1 reply

alliepiper
Nov 15, 2021
Collaborator