-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query about the new update about inference on CPU and GPU. #5
Comments
@kaivu1999 that's a really good discussion you brought up and has been of interest since the release of BMXNetv1. |
@simonmaurer I think in the XNORNet they mention 32x memory savings and 52x computation speeding. |
using CuDNN for training because it is over optimized and much faster than the preliminary implementation of the xnor kernel in cuda, I believe an optimized xnor cuda kernel will indeed improve the speed a lot, but we are not expert for that thus leave this to the community for now. According to the 52 times speedup mentioned in the xnor paper. I think this number is also based on the xnor-gemm function, not the whole conv layer. And they only compared to the naive implementation of a dot engine, even without reporting the comparison with CBlas (atlas or others). I have seen their codes in Darknet before they removed them years ago (since they launched a startup company XNOR.ai thus remove the code from darknet). |
Thank you very much, Also I got these numbers for BMXNet v1 considering only inference.
Where the blue line represents the experiment where I tried to use the accelerated layers as much as possible. I accelerated the last layer (Not recommended for accuracy) and for NIN I had to approximate some of the layers in between so that the input channels are a multiple of 64 as the accelerated layers (QActi , QConv , QFullyC) seems to support input layers with size in multiples of 64. Also for a Network with just Convolution and Activation layer pairs. In a network with 8 of such pairs where I accelerated 7 of them in the binary version, I am getting a Speed Up of 6.8x for 256 as batch_size. Also I would really like to know about the update on a similar note.
|
Description
I am actually interested in the speed up that I can get on CPU and GPU especially for inference.
According to the answer by @yanghaojin
I have tried BMXNet v1 for the same and I get speed up on CPU of about 1.4x - 1.7x on my PC for some models but also a decrease in speed up in some case.
I used : Ubuntu 16.04/64-bit platform on Intel(R) Core™ i5-8250U CPU@ 1.60GHz (supports SSE4.2)
Can you please elaborate about the update of 21st May 2019 wrt speed up ?
The update which is written in the changelog
The text was updated successfully, but these errors were encountered: