Image recognition program using MNIST labelled images.
A list of images is converted into an array containing vectors of pixel brightnesses: pi where each array index i corresponds to a different image.
Each image pi has a corresponding label qi, a unit vector of dimension n, where n is the number of labels. Values of qi belong to a set of n orthogonal vectors that map to each label.
Now that we have a way to interpret images and labels as vectors, we need a function that takes 'image' p to 'label' q.
One option is q = W.p where W is a matrix,
but let's define:
norm(x) ≡ x/|x|,
ξ(M,R; pi) ≡ norm(enorm(R.eM.pi)),
where R and M are matrices.
ξ has the desired properties:
- it is differentiable;
- it also maps between vectors;
- it has more non-degenerate parameters that can be varied independently than W.
For some values of R and M, ξ will map images pi to vectors very close to qi. This happens when qi.ξ(M,R; pi) ≈ 1.
So the goal is to maximise f = Σi log(qi.ξ(pi)) by varying R and M.
I calculated expressions for ∂fi/∂M and ∂fi/∂R and used this program to perform many small gradients descents for each image.
Having now optimised R and M, this program applies ξ to new test images and can label them with 98.7% accuracy.