I tried a simple NN on the MNIST Digit Data:
resource = ResourceObject["MNIST"];
trainingData = ResourceData[resource, "TrainingData"];
testData = ResourceData[resource, "TestData"];
n = NetChain[{FlattenLayer[], 64, Ramp, 10, SoftmaxLayer[]},
"Output" -> NetDecoder[{"Class", Range[0, 9]}],
"Input" -> NetEncoder[{"Image", {28, 28}, "Grayscale"}]]
Trained it a short time:
In[5]:= AbsoluteTiming[
t1 = NetTrain[n, trainingData, BatchSize -> 100,
MaxTrainingRounds -> 4]]
Out[5]= {18.2776, NetChain[ <> ]}
Checked the accuracy:
In[6]:= ClassifierMeasurements[t1, testData, "Accuracy"]
Out[6]= 0.9176
Seems nothing unusual here. But then I tried the same NN with the same training with KERAS/Tensorflow. Results:
Wall time: 7.58 s
10000/10000 [==============================] - 0s 30us/sample - loss: 0.1159 - acc: 0.9666
Time difference seems plausible, because I trained with a CPU under MMA and a TPU under KERAS, but where the difference in accuracy came from? 4.8 percentage points is quite a lot.
Image of the Digits within MMA vs Image in Keras:
Background in MMA ist White == 1, whereas in Keras is Black == 0. The latter is also the form of the original data from http://yann.lecun.com/exdb/mnist/.
Reinversion is easy but take some time:
itrain = MapAt[1 - # &, trainingData, {All, 1}];
itest = MapAt[1 - # &, testData, {All, 1}];
Train the NN again with the reinverted data:
In[10]:= AbsoluteTiming[
t2 = NetTrain[n, itrain, BatchSize -> 100, MaxTrainingRounds -> 4]]
Out[10]= {9.76811, NetChain[ <> ]}
Check accuracy:
In[11]:= ClassifierMeasurements[t2, itest, "Accuracy"]
Out[11]= 0.9636
Not only the training time has halved, but also the accuracy has improved by over 4 percentage points and is now similar to the result with Keras!
Has anyone an explanation, why the inversion has such a big impact?