A chainer implementation of self-supervised jigsaw CNNs. The authors have published their caffe implementation
The jigsaw CNN learn a representation by reassembling an image from it's patches.
This is achieved by:
- randomly cropping a square from the image.
- segmenting the crop into 9 patches (with more random crops).
- permuting the patches
- predict what permutation was applied to the patches.
With the aim of learning about structure, colour and texture without labels.
python -m jigsaw.train --gpu 3 "/path/to/train/*.jpg" "/path/to/test/*.jpg"
Note that the path globs must be quoted or the shell we expand them. Images will automatically
be rescaled, cropped and turned into patches at runtime. Check --help
for more details. Training
on the cpu is not supported, you must specify a gpu ID.
This is what the first layer filters look like after 350k batches. They look good but need some more fine tuning.
To identify an n-permutation
we only need n-1
elements so I've made the task harder by randomly zero'ing one of the patches (i.e dropout for patches). Permutations are generated in a different manner than specified in the paper but the average hamming distance is almost the same at 0.873
(see scripts/perm-gen.py).
The architecture we use to generate patch representations is closer to ZFNet than AlexNet
Training could be made faster by precalculating batches.