Add caffe with CuDNN[R4] to benchmark. #90

cesarsalgado · 2016-02-29T00:38:44Z

No description provided.

soumith · 2016-02-29T00:41:43Z

i do not think this will give us a lot more data points, but i am happy to do it. Caffe install is always a bit of a tightrope balancing act to get right, i'll do it in a few days.

cesarsalgado · 2016-02-29T02:52:16Z

Thanks!

soumith · 2016-02-29T04:13:24Z

I've ran the caffe numbers here:
6f718db

It is strange, because the caffe numbers look to be quite off.
Alexnet: 128ms vs 81ms (Torch-fp32)
Overfeat: 430ms vs 268ms (Torch-fp32)
VGG-A: 680ms vs 529ms (Torch-fp32)
Googlenet: 484ms. vs 470ms (Torch-fp32)

The only thing I can think of right now is that Torch enables the CuDNN autotuner (via a caching mechanism on sizes / strides ), and I suspect that Caffe does not enable it, and just uses cudnn heuristics, which are not always best perf.

In fact, now I am suspecting that maybe TF also does not enable autotuner.

The only network where Caffe looks close to Torch is Googlenet, and it seems to have serious perf regressions for the other 3. (though both are using the same code, i.e. CuDNN R4 + CuBLAS 7.5)

Should I add these numbers to the readme?
Considering how sensitive the benchmarks have become, I would want someone from the Caffe ecosystem to have a quick look at the prototxt files to see if there's any new settings I should add that were recent.

beniz · 2016-02-29T07:25:50Z

Adding them with a slight warning containing your second paragraph seems a good thing to do... better than keeping with the 'native' bench IMO... Thanks for the great work.
I can take a look at the Caffe bench and prototxt files a bit later in the day if this helps.

beniz · 2016-02-29T07:52:49Z

OK, so quick remarks:

the .prototxt files are in old format, not that it matters much I believe, but I could PR an update if you're interested
alexnet.prototxt is missing the ReLU in between fc6,fc7,fc8, not sure whether this is on purpose ?

soumith · 2016-02-29T07:54:53Z

@beniz def up for a PR to make it up to date. the missing ReLU are def an oversight, have to be added.

hobofan · 2016-02-29T08:24:25Z

I recently looked into the performance of Caffe when bringing our framework Leaf up to speed and I can confirm that the biggest speed hit comes from not using the autotuner. Caffe is also loosing a bit of time (IIRC 2-3ms) because it reshapes its layers on every forward pass, where it reallocates some cuDNN descriptors.

melgor mentioned this issue Mar 9, 2016

Train with VGG Dataset cmusatyalab/openface#103

Closed

happyharrycn mentioned this issue Mar 17, 2016

The demo (ZF) with cuDNN is a little slower than without cuDNN? rbgirshick/py-faster-rcnn#114

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add caffe with CuDNN[R4] to benchmark. #90

Add caffe with CuDNN[R4] to benchmark. #90

cesarsalgado commented Feb 29, 2016

soumith commented Feb 29, 2016

cesarsalgado commented Feb 29, 2016

soumith commented Feb 29, 2016

beniz commented Feb 29, 2016

beniz commented Feb 29, 2016

soumith commented Feb 29, 2016

hobofan commented Feb 29, 2016

Add caffe with CuDNN[R4] to benchmark. #90

Add caffe with CuDNN[R4] to benchmark. #90

Comments

cesarsalgado commented Feb 29, 2016

soumith commented Feb 29, 2016

cesarsalgado commented Feb 29, 2016

soumith commented Feb 29, 2016

beniz commented Feb 29, 2016

beniz commented Feb 29, 2016

soumith commented Feb 29, 2016

hobofan commented Feb 29, 2016