Numba is a just-in-time, type-specializing, function compiler for accelerating numerically-focused Python. It can be typically enabled by applying a decorator to a python function and can compile your code for CPU or GPU. It uses LLVM to compile python functions just-in-time, under the hood. Cupy is a numpy-like library accelerated with CUDA. It's syntax is very similar to numpy and in most cases you can directly replace the numpy import with cupy. It allows us to write custom kernels in CUDA and can be easily used with numba CUDA functions.The deep learning library chainer uses cupy in it's backend.
In XNOR convolution, both the filters and the input to convolutional layers are binary. Now, by approximating the convolution operations with XNOR and bitcounting operations, we can gain massive speed-up and memory savings. Even though this seems straight-forward(theoretically),in practice an efficient implementation of bitpacking, approximation techniques and training mechanisms are required to acheive sufficient accuracy and speed on conventional hardware platforms.The AI startup-company XNOR.AI(recently acquired by apple) actually made these techniques popular in their 2016 paper - XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks.
In the IPython Notebook, we try to implement a basic convolution using python and subsequently improve it's speed using numba and other optimization techniques. Finally,we compare and benchmark the various techniques in python for CPU and GPU in terms of execution speed. The notebook can be directly run on google colaboratory ,using a GPU runtime without any additional installaton of libraries or packages.
Note: The benchmarks heavily depends on the hardware and library versions used for experimentation.
- Python, Numba
- Cupy, Numpy
- CUDA
- Boost Python With GPU
- Numba: Talks and Tutorials
- ContinuumIO:GTC 2020 Numba
- Create CUDA kernels from Python using Numba and CuPy
- Numba Cuda: Fast Matrix Multiplication
- Python: Popcount Benchmark
- Understanding Binary Neural Networks
- Binary Neural Networks
- Python: Packing Bitlist to UINT64
- Python: Im2col Implementation
- Numba: Automatic Parallelization
- Numba: Using Stencils
- Binarization of Low Level Operations in DNN
- Stanford:CS231n Im2Col Assignment
- Introduction to CUDA in Python
- Intel: Accelerating Neural Networks With Binary Arithmetic
- Here’s How to Use CuPy to Make Numpy Over 10X Faster