J port of ashinkarov/cnn-in-apl, which is described in this paper:
Artjoms Šinkarovs, Robert Bernecky, and Sven-Bodo Scholz. 2019. Convolutional neural networks in APL. In Proceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming (ARRAY 2019). Association for Computing Machinery, New York, NY, USA, 69–79. DOI:https://doi.org/10.1145/3315454.3329960
Ensure you have J (tested on J901) and the format/printf
and stats/base
addons installed. Then, run download-mnist.sh
. If you can't run the script, download the MNIST files yourself, extract them, and place them in input
. Then, run cnn.ijs
to train and test the CNN.
In main
, you can customize training by tweaking epochs
, trainings
(number of training examples), tests
(number of test examples), rate
, and momentum
. The APL version also has a batchsize
variable, but it's just for show: the CNN is trained using stochastic gradient descent, not batch gradient descent.
For reproducibility, the RNG seed is explicitly set to 16807
, which appears to be the default as of J901. You can change it if you want to get different results.
This is a mostly faithful translation of the GitHub version (not the paper version, which uses different variable names) with a few enhancements:
- Initialization of weights following Zhang's paper (section 1.1)
- Standardization of images
- Shuffling of training data at the start of each epoch
- Nesterov accelerated gradient
These changes increase the accuracy from 76.23% to 93.27%. There are many other opportunities for improvement, but the code is so slow that it's a drag to test changes. For example, the accuracy can be improved to 97.47% by training on all 60k images for 1 epoch, but that takes about 22 minutes on my laptop (a speed of 45 images/s).
The other notable difference is that while the APL version avoids stencil ⌺
, we don't avoid J's subarrays ;._3
: J doesn't add padding like Dyalog does, so there's no performance penalty.