With respect to the difference of the two inversion methods they are only the same up to the 5th decimal so there might be something going on there but no clue what.
In[93]:= (1 - trainingData[[1, 1]]) ===
ColorNegate[trainingData[[1, 1]]]
ImageData[1. - trainingData[[1, 1]]] ===
ImageData[ColorNegate[trainingData[[1, 1]]]]
(Round[ImageData[1. - trainingData[[1, 1]]], 10.^-5]) === (Round[
ImageData[ColorNegate[trainingData[[1, 1]]]], 10.^-5])
Out[93]= False
Out[94]= False
Out[95]= True
Regarding the inverting of the grayscale images. Although the Image head in Mathematica can be usefull it is verry verry slow. If you want to speed up treat images as what they are arrays of numbers, that is what any other language does as wel. As you can see this is almost 100x faster than 1-Image as you had and 5x faster as ColorNegate.
In[58]:= AbsoluteTiming[
itrain1 = Map[ColorNegate[First[#]] -> Last[#] &, trainingData];
itest1 = Map[ColorNegate[First[#]] -> Last[#] &, testData];
]
Out[58]= {6.07861, Null}
In[59]:= AbsoluteTiming[
itrain2 = Map[1 - ImageData[First[#]] -> Last[#] &, trainingData];
itest2 = Map[1 - ImageData[First[#]] -> Last[#] &, testData];
]
Out[59]= {1.08891, Null}
This also holds for the training of the network, NetEncoder is slow. I have the strong fealing that this is done on the CPU in stead of the GPU. My CPU was number crunching some other stuff (~75% CPU usage) while running these examples and the difference is remarkable.
n = NetChain[{FlattenLayer[], 64, Ramp, 10, SoftmaxLayer[]},
"Output" -> NetDecoder[{"Class", Range[0, 9]}],
"Input" -> NetEncoder[{"Image", {28, 28}, "Grayscale"}]]
n2 = NetChain[{FlattenLayer[], 64, Ramp, 10, SoftmaxLayer[]},
"Output" -> NetDecoder[{"Class", Range[0, 9]}], "Input" -> {28, 28}]
Then running the training is almost 10x faster when not using Image+NetEncoder which is very important when training large problems. So when not using image the inversion is 5x faster and the training is 10x faster. So although mathematica can make your life easier with all kinds of encoders treat data as what is is, numbers in arrays!
In[128]:=
t2 = NetTrain[n, itrain1, All, BatchSize -> 500,
MaxTrainingRounds -> 4, TargetDevice -> "GPU"];
t2["TotalTrainingTime"]
ClassifierMeasurements[t2["TrainedNet"], itest1, "Accuracy"]
Out[129]= 29.547
Out[130]= 0.9449
In[116]:=
t3 = NetTrain[n2, itrain2, All, BatchSize -> 500,
MaxTrainingRounds -> 4, TargetDevice -> "GPU"];
t3["TotalTrainingTime"]
ClassifierMeasurements[t3["TrainedNet"], itest2, "Accuracy"]
Out[117]= 2.12835
Out[118]= 0.9449
With regard why the inverted image is different than the normal images is because if you train the inverted image the most of the data is 0 while only very few values are non zero. The network trains a function to describe the data which is less complicated if the image is sparce (mostly 0).
In[141]:= (*Original data*)
Count[N@Flatten[ImageData[trainingData[[1, 1]]]], 0.]
(*inverted data*)
Count[Flatten[ImageData[1. - trainingData[[1, 1]]]], 0.]
Out[141]= 2
Out[142]= 608
To test this hypothesis we can make the original data more sparce to see if this improves the training. We make the data more sparce by including more 0 values.
In[170]:= AbsoluteTiming[
itrain0 = Map[ImageData[First[#]] -> Last[#] &, trainingData];
itest0 = Map[ImageData[First[#]] -> Last[#] &, testData];
]
AbsoluteTiming[
itrain0S =
Map[Clip[(ImageData[First[#]] - 0.5), {0, 1}] -> Last[#] &,
trainingData];
itest0S =
Map[Clip[(ImageData[First[#]] - 0.5), {0, 1}] -> Last[#] &,
testData];
]
(*Original data*)
Count[N@Flatten[Clip[ImageData[trainingData[[1, 1]]], {0, 1}]], 0.]
(*Sparse original data*)
Count[N@Flatten[
Clip[ImageData[trainingData[[1, 1]]] - 0.5, {0, 1}]], 0.]
Out[170]= {1.23638, Null}
Out[171]= {1.81911, Null}
Out[172]= 2
Out[173]= 125
So lets see if it helps
In[144]:=
t0 = NetTrain[n2, itrain0, All, BatchSize -> 500,
MaxTrainingRounds -> 4, TargetDevice -> "GPU"];
t0["TotalTrainingTime"]
ClassifierMeasurements[t0["TrainedNet"], itest0, "Accuracy"]
Out[145]= 3.93681
Out[146]= 0.9053
In[167]:=
t0S = NetTrain[n2, itrain0S, All, BatchSize -> 500,
MaxTrainingRounds -> 4, TargetDevice -> "GPU"];
t0S["TotalTrainingTime"]
ClassifierMeasurements[t0S["TrainedNet"], itest0S, "Accuracy"]
Out[168]= 2.6613
Out[169]= 0.9109
So the less information the images contains the less is needed to describe the data so the easier it becomes for a NN to figure out what is going on.
I attached the notebook so others can test this.
Attachments: