Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is your setup of Caffe-Greentea optimal ? #70

Open
NH89 opened this issue Dec 3, 2015 · 3 comments
Open

Is your setup of Caffe-Greentea optimal ? #70

NH89 opened this issue Dec 3, 2015 · 3 comments

Comments

@NH89
Copy link

NH89 commented Dec 3, 2015

I see you are getting markedly slow results with Caffe-Greentea. Which backends are you using, and do you know if they are the best available ?

In ( @naibaf7 ) Fabian Tschopp's tech report http://arxiv.org/pdf/1509.03371.pdf table 6.10 he shows 20x variation in performance depending on which manufacturer's libraries are used.

@naibaf7
Copy link

naibaf7 commented Dec 3, 2015

@NH89
Greentea/OpenCL is really slow for CNNs with batched data because of overhead and inefficiency in the Matrix-Matrix multiplications used for convolutions, especially when they are smaller. This benchmark also uses the ViennaCL library, the clBLAS AMD library could be a bit faster as well (can be selected at compile time).

However, to be really up to speed, there need to be vendor and hardware specific convolution libraries such as cuDNN.

AMD has an OpenCL branch (https://github.com/amd/OpenCL-caffe) where they are alternatively unwrapping the batch into one large Matrix-Matrix multiplication. Very memory inefficient compared to cuDNN, but almost as fast.

In the same technical report you can read up that with interleaved, pixelwise classification data (which causes large matrix-matrix multiplications and thus a higher efficiency and no batches) are comparably fast to CUDA.

@NH89
Copy link
Author

NH89 commented Dec 3, 2015

@naibaf7 Thanks, you saved me from making an expensive error :-) Thank you also for creating Greentea.

@naibaf7
Copy link

naibaf7 commented Dec 3, 2015

@NH89

No problem. Probably the OpenCL approaches will catch up with CUDA solutions during Q2/3 next year, as major developments are going on by both AMD and Intel.

For my projects in biomedical image segmentation, the OpenCL solution is already speed competitive though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants