Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance issues on Nvidia #189

Open
ntessore opened this issue Jun 25, 2015 · 1 comment
Open

performance issues on Nvidia #189

ntessore opened this issue Jun 25, 2015 · 1 comment
Labels

Comments

@ntessore
Copy link
Contributor

Lensed is much slower on Nvidia GPUs (at least on Mac) than on the nominally less powerful Intel and AMD cards. With the basic profiling capabilities of #187, it should be possible to at least get an idea of where the time is spent on the individual platforms.

@ntessore ntessore added the bug label Jun 25, 2015
@ntessore
Copy link
Contributor Author

After testing, it seems like the time spent in the individual OpenCL functions is the same as in the case with AMD/Intel GPUs, except a slight overhead for memory transfers (the AMD and Intel cards I tested on are integrated chips, and memory transfers are zero-cost).

Furthermore, I have found that the extra time is spent neither on the CPU nor on the GPU. The OpenCL implementation simply waits on a semaphore, uselessly, for no apparent reason. This happens when the blocking clEnqueueMapBuffer calls are made. Especially strange is that this also happens when there is nothing in the queue to block on.

Finally I tried rewriting to use the pinned memory as Nvidia does in their OpenCL SDK examples, using a fixed mapped memory segment and clEnqueue{Read|Write}Buffer from/to it. The same hang occurs. This is mysterious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant