Message Boards Message Boards

2
|
9003 Views
|
7 Replies
|
8 Total Likes
View groups...
Share
Share this post:

Inverted MNIST Digit Data

Posted 5 years ago

I tried a simple NN on the MNIST Digit Data:

resource = ResourceObject["MNIST"];
trainingData = ResourceData[resource, "TrainingData"];
testData = ResourceData[resource, "TestData"];

n = NetChain[{FlattenLayer[], 64, Ramp, 10, SoftmaxLayer[]}, 
  "Output" -> NetDecoder[{"Class", Range[0, 9]}], 
  "Input" -> NetEncoder[{"Image", {28, 28}, "Grayscale"}]]

Trained it a short time:

In[5]:= AbsoluteTiming[
 t1 = NetTrain[n, trainingData, BatchSize -> 100, 
   MaxTrainingRounds -> 4]]

Out[5]= {18.2776, NetChain[ <> ]}

Checked the accuracy:

In[6]:= ClassifierMeasurements[t1, testData, "Accuracy"]

Out[6]= 0.9176

Seems nothing unusual here. But then I tried the same NN with the same training with KERAS/Tensorflow. Results:

Wall time: 7.58 s

10000/10000 [==============================] - 0s 30us/sample - loss: 0.1159 - acc: 0.9666

Time difference seems plausible, because I trained with a CPU under MMA and a TPU under KERAS, but where the difference in accuracy came from? 4.8 percentage points is quite a lot.

Image of the Digits within MMA vs Image in Keras: enter image description here enter image description here

Background in MMA ist White == 1, whereas in Keras is Black == 0. The latter is also the form of the original data from http://yann.lecun.com/exdb/mnist/.

Reinversion is easy but take some time:

itrain = MapAt[1 - # &, trainingData, {All, 1}];
itest = MapAt[1 - # &, testData, {All, 1}];

Train the NN again with the reinverted data:

In[10]:= AbsoluteTiming[
 t2 = NetTrain[n, itrain, BatchSize -> 100, MaxTrainingRounds -> 4]]

Out[10]= {9.76811, NetChain[ <> ]}

Check accuracy:

In[11]:= ClassifierMeasurements[t2, itest, "Accuracy"]

Out[11]= 0.9636

Not only the training time has halved, but also the accuracy has improved by over 4 percentage points and is now similar to the result with Keras!

Has anyone an explanation, why the inversion has such a big impact?

POSTED BY: gus s
7 Replies

What I got from the discussion is that batch normalization scales a batch to be zero mean and 1 variance such that the scaling of the original data is irrelevant. So if an image is 0 background and 1 foreground or vise versa, or even 100 background and 200 foreground the batch normalization learns how to scale the data such that the network will alway behaves the same if the information in the images is similar.

POSTED BY: Martijn Froeling
Attachments:
POSTED BY: Martijn Froeling
Posted 5 years ago

Thanks a lot. Very useful stuff.

Additional to the big effect of Image-NetEncoder as you highlighted, the initial FlattenLayer also seems to be ~10% slower than a direct call to Flatten. So if I put this together I get on my machine:

In[4]:= nn = 
 NetChain[{64, Ramp, 10, SoftmaxLayer[]}, 
  "Output" -> NetDecoder[{"Class", Range[0, 9]}], "Input" -> 28*28]

Out[4]= NetChain[ <> ]

In[5]:= AbsoluteTiming[
 itrain3 = (1. - Flatten@ImageData@First@#) -> Last@# & /@ 
   trainingData;
 itest3 = (1. - Flatten@ImageData@First@#) -> Last@# & /@ testData;]

Out[5]= {1.59009, Null}

In[6]:= AbsoluteTiming[
 tn = NetTrain[nn, itrain3, BatchSize -> 100, MaxTrainingRounds -> 4]]

Out[6]= {5.67012, NetChain[ <> ]}

In[7]:= ClassifierMeasurements[tn, itest3, "Accuracy"]

Out[7]= 0.9652

Training is now faster than under Keras/Tensorflow! Accuracy about even. I like that, because I prefer working with MMA to Python/IPython/Jupyter/Keras.

Relating the inversion, I was first puzzled, that a bijective mapping, which leaves the entropy unchanged, could have any effect at all. But it seems plausible, that the zero plays its special role by making terms vanish and simplifying the computation. So maybe Wolfram should change the curated MNIST data to its original form with a zero background, which would make things comparable to other systems. Also if the Image-NetEncoder cannot made faster, at least a "Possible Issues"-Section under the NetEncoder help would be nice.

POSTED BY: gus s

I talked to someone with more knowledge on this topic than i have and after a nice grin he told me that is why batch normalization exists.

In[4]:= nn = 
  NetChain[{64, Ramp, 10, SoftmaxLayer[]}, 
   "Output" -> NetDecoder[{"Class", Range[0, 9]}], "Input" -> 28*28];
nnn = NetChain[{BatchNormalizationLayer[], 64, Ramp, 10, 
    SoftmaxLayer[]}, "Output" -> NetDecoder[{"Class", Range[0, 9]}], 
   "Input" -> 28*28];

AbsoluteTiming[
 itrain3 = (1. - Flatten@ImageData@First@#) -> Last@# & /@ 
   trainingData;
 itest3 = (1. - Flatten@ImageData@First@#) -> Last@# & /@ testData;
 ]

AbsoluteTiming[
 itrain2 = (Flatten@ImageData@First@#) -> Last@# & /@ trainingData;
 itest2 = (Flatten@ImageData@First@#) -> Last@# & /@ testData;
 ]

Out[6]= {1.10475, Null}

Out[7]= {0.812343, Null}

In[8]:= tr2 = 
  NetTrain[nn, itrain2, All, BatchSize -> 500, 
   MaxTrainingRounds -> 20, TargetDevice -> "GPU"];
tr2["TotalTrainingTime"]
ClassifierMeasurements[tr2["TrainedNet"], itest2, "Accuracy"]

Out[9]= 10.1913

Out[10]= 0.9352

In[11]:= tr2n = 
  NetTrain[nnn, itrain2, All, BatchSize -> 500, 
   MaxTrainingRounds -> 20, TargetDevice -> "GPU"];
tr2n["TotalTrainingTime"]
ClassifierMeasurements[tr2n["TrainedNet"], itest2, "Accuracy"]

Out[12]= 10.2112

Out[13]= 0.9709

In[14]:= tr3 = 
  NetTrain[nn, itrain3, All, BatchSize -> 500, 
   MaxTrainingRounds -> 20, TargetDevice -> "GPU"];
tr3["TotalTrainingTime"]
ClassifierMeasurements[tr3["TrainedNet"], itest3, "Accuracy"]

Out[15]= 10.0675

Out[16]= 0.9732

In[17]:= tr3n = 
  NetTrain[nnn, itrain3, All, BatchSize -> 500, 
   MaxTrainingRounds -> 20, TargetDevice -> "GPU"];
tr3n["TotalTrainingTime"]
ClassifierMeasurements[tr3n["TrainedNet"], itest3, "Accuracy"]

Out[18]= 10.2946

Out[19]= 0.9734
POSTED BY: Martijn Froeling
Posted 5 years ago
POSTED BY: gus s

I don't have an explanation, but color inversion might be faster with this:

Map[ColorNegate[First[#]] -> Last[#] &, trainingData]
POSTED BY: Arnoud Buzing
Posted 5 years ago
POSTED BY: gus s
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract