Anton,
This is very nice. One remark is that I've had some success using the Fourier Sine or Cosine transform (I'll use FCT in referring to this below) as a preprocessor. The idea is to use it on the image matrix, then extract a submatrix of say the lowest 10x10 components and use only that in the SVD phase. This offers three advantages.
(1) It improves speed. The FCT is fast enough that it is not a bottleneck, and by reducing size of the input to SVD that becomes much faster.
(2) It cuts down the memory footprint. FCT does not really take much memory since it can be done one at a time. SVD is done on the entire training ensemble and on a large matrix it will thus require more memory relative to a smaller one.
(3) The extra level of dimension reduction via FCT improves the outcome. Depending on fine tuning of parameter settings (how many FCT components, how many singular values, how many lookup values to use in a majority rules classifier) it can hit 97.8% correct recognition. Also if we look for the 5 nearest neighbors, my recollection is that nearly always at least one will be the correct digit.
Details of all this will be in the SYNASC 2016 proceedings (final version was submitted yesterday, and I think it comes out in January or so). The code for the MNIST example is not very long so I'll include it below. Since the images are coarse (28x28) I also sharpen them. That turns out to be the biggest speed bottleneck actually (takes several minutes whereas the rest totals less than a minute). Also I use the fourth Fourier Sine transform. I can get similar results with the second Cosine transform, if memory serves correctly.
keep = 31;
dn = 10;
dst = 4;
Clear[nearestImages, processInput];
nearestImages[ilist_, vals_, dn_, dnum_, keep_] :=
Module[
{idata, images = ilist, dcts, top,
topvecs, uu, ww, vv, udotw, norms},
idata = Map[ImageData, images];
dcts = Map[FourierDST[# - Mean[Flatten[#]], dnum] &, idata];
top = dcts[[All, 1 ;; dn, 1 ;; dn]];
topvecs = Map[Flatten, top];
topvecs = Map[# - Mean[#] &, topvecs];
{uu, ww, vv} =
SingularValueDecomposition[topvecs, keep];
udotw = uu.ww;
norms = Map[Sqrt[#.#] &, udotw];
udotw = udotw/norms;
udotw = Join[udotw, Transpose[{Log[norms]}], 2];
{Nearest[udotw -> vals, Method -> "KDTree"], vv}]
processInput[ilist_, vv_, dn_, dnum_] :=
Module[
{idata, images = ilist, dcts, top,
topvecs, tdotv, norms},
idata = Map[ImageData, images];
dcts = Map[FourierDST[# - Mean[Flatten[#]], dnum] &, idata];
top = dcts[[All, 1 ;; dn, 1 ;; dn]];
topvecs = Map[Flatten, top];
topvecs = Map[# - Mean[#] &, topvecs];
tdotv = topvecs.vv;
norms = Map[Sqrt[#.#] &, tdotv];
tdotv = tdotv/norms;
tdotv = Join[tdotv, Transpose[{Log[norms]}], 2];
tdotv]
trainingBytes = Import[
"http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz" , "Byte"];
trainingImages = Map[Image[Partition[#, 28]] &, Partition[Drop[trainingBytes, 16], 28^2]];
testBytes = Import[
"http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz" , "Byte"];
testImages = Map[Image[Partition[#, 28]] &, Partition[Drop[testBytes, 16], 28^2]];
trainingLabels = Drop[Import[
"http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz", "Byte"], 8];
testLabels = Drop[Import[
"http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz", "Byte"], 8];
func = Sharpen[#, 12] &;
trainingImages = Map[func, trainingImages];
testImages = Map[func, testImages];
guesses[nf_, tvecs_, n_] := Module[
{nbrs, counts},
nbrs = Map[nf[#, n] &, tvecs];
counts = Map[Tally, nbrs];
counts =
Map[Reverse, Map[SortBy[#[[2]] &], counts]];
counts[[All, 1, 1]]]
correct[guess_, actual_] /;
Length[guess] == Length[actual] :=
Count[guess - actual, 0]
correct[__] := $Failed
AbsoluteTiming[{nf, vv} =
nearestImages[trainingImages,
trainingLabels, dn, dst, keep];
testvecs =
processInput[testImages, vv, dn, dst];]
(* Out[121]= {4.248808, Null} *)
AbsoluteTiming[
guessed = guesses[nf, testvecs, 5];]
correct[guessed, testLabels]
(* Out[122]= {13.071579, Null}
Out[123]= 9781 *)