-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add caffe with CuDNN[R4] to benchmark. #90
Comments
i do not think this will give us a lot more data points, but i am happy to do it. Caffe install is always a bit of a tightrope balancing act to get right, i'll do it in a few days. |
Thanks! |
I've ran the caffe numbers here: It is strange, because the caffe numbers look to be quite off. The only thing I can think of right now is that Torch enables the CuDNN autotuner (via a caching mechanism on sizes / strides ), and I suspect that Caffe does not enable it, and just uses cudnn heuristics, which are not always best perf. In fact, now I am suspecting that maybe TF also does not enable autotuner. The only network where Caffe looks close to Torch is Googlenet, and it seems to have serious perf regressions for the other 3. (though both are using the same code, i.e. CuDNN R4 + CuBLAS 7.5) Should I add these numbers to the readme? |
Adding them with a slight warning containing your second paragraph seems a good thing to do... better than keeping with the 'native' bench IMO... Thanks for the great work. |
OK, so quick remarks:
|
@beniz def up for a PR to make it up to date. the missing ReLU are def an oversight, have to be added. |
I recently looked into the performance of Caffe when bringing our framework Leaf up to speed and I can confirm that the biggest speed hit comes from not using the autotuner. Caffe is also loosing a bit of time (IIRC 2-3ms) because it reshapes its layers on every forward pass, where it reallocates some cuDNN descriptors. |
No description provided.
The text was updated successfully, but these errors were encountered: