Message Boards

WOLFRAM COMMUNITY

9003 Views

7 Replies

8 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Inverted MNIST Digit Data

gus s

Posted 5 years ago

I tried a simple NN on the MNIST Digit Data: resource = ResourceObject["MNIST"]; trainingData = ResourceData[resource, "TrainingData"]; testData = ResourceData[resource, "TestData"]; n = NetChain[{FlattenLayer[], 64, Ramp, 10, SoftmaxLayer[]}, "Output" -> NetDecoder[{"Class", Range[0, 9]}], "Input" -> NetEncoder[{"Image", {28, 28}, "Grayscale"}]] Trained it a short time: In[5]:= AbsoluteTiming[ t1 = NetTrain[n, trainingData, BatchSize -> 100, MaxTrainingRounds -> 4]] Out[5]= {18.2776, NetChain[ <> ]} Checked the accuracy: In[6]:= ClassifierMeasurements[t1, testData, "Accuracy"] Out[6]= 0.9176 Seems nothing unusual here. But then I tried the same NN with the same training with KERAS/Tensorflow. Results: Wall time: 7.58 s 10000/10000 [==============================] - 0s 30us/sample - loss: 0.1159 - acc: 0.9666 Time difference seems plausible, because I trained with a CPU under MMA and a TPU under KERAS, but where the difference in accuracy came from? 4.8 percentage points is quite a lot. Image of the Digits within MMA vs Image in Keras: Background in MMA ist White == 1, whereas in Keras is Black == 0. The latter is also the form of the original data from http://yann.lecun.com/exdb/mnist/. Reinversion is easy but take some time: itrain = MapAt[1 - # &, trainingData, {All, 1}]; itest = MapAt[1 - # &, testData, {All, 1}]; Train the NN again with the reinverted data: In[10]:= AbsoluteTiming[ t2 = NetTrain[n, itrain, BatchSize -> 100, MaxTrainingRounds -> 4]] Out[10]= {9.76811, NetChain[ <> ]} Check accuracy: In[11]:= ClassifierMeasurements[t2, itest, "Accuracy"] Out[11]= 0.9636 Not only the training time has halved, but also the accuracy has improved by over 4 percentage points and is now similar to the result with Keras! Has anyone an explanation, why the inversion has such a big impact?

POSTED BY: gus s

7 Replies

Sort By:

Martijn Froeling

Martijn Froeling, University Medical Center Utrecht

Posted 5 years ago

What I got from the discussion is that batch normalization scales a batch to be zero mean and 1 variance such that the scaling of the original data is irrelevant. So if an image is 0 background and 1 foreground or vise versa, or even 100 background and 200 foreground the batch normalization learns how to scale the data such that the network will alway behaves the same if the information in the images is similar.

POSTED BY: Martijn Froeling

Martijn Froeling

Martijn Froeling, University Medical Center Utrecht

Posted 5 years ago

Attachments: MNIST.nb

POSTED BY: Martijn Froeling

gus s

Posted 5 years ago

Thanks a lot. Very useful stuff. Additional to the big effect of Image-NetEncoder as you highlighted, the initial FlattenLayer also seems to be ~10% slower than a direct call to Flatten. So if I put this together I get on my machine: In[4]:= nn = NetChain[{64, Ramp, 10, SoftmaxLayer[]}, "Output" -> NetDecoder[{"Class", Range[0, 9]}], "Input" -> 28*28] Out[4]= NetChain[ <> ] In[5]:= AbsoluteTiming[ itrain3 = (1. - Flatten@ImageData@First@#) -> Last@# & /@ trainingData; itest3 = (1. - Flatten@ImageData@First@#) -> Last@# & /@ testData;] Out[5]= {1.59009, Null} In[6]:= AbsoluteTiming[ tn = NetTrain[nn, itrain3, BatchSize -> 100, MaxTrainingRounds -> 4]] Out[6]= {5.67012, NetChain[ <> ]} In[7]:= ClassifierMeasurements[tn, itest3, "Accuracy"] Out[7]= 0.9652 Training is now faster than under Keras/Tensorflow! Accuracy about even. I like that, because I prefer working with MMA to Python/IPython/Jupyter/Keras. Relating the inversion, I was first puzzled, that a bijective mapping, which leaves the entropy unchanged, could have any effect at all. But it seems plausible, that the zero plays its special role by making terms vanish and simplifying the computation. So maybe Wolfram should change the curated MNIST data to its original form with a zero background, which would make things comparable to other systems. Also if the Image-NetEncoder cannot made faster, at least a "Possible Issues"-Section under the NetEncoder help would be nice.

POSTED BY: gus s

Martijn Froeling

Martijn Froeling, University Medical Center Utrecht

Posted 5 years ago

I talked to someone with more knowledge on this topic than i have and after a nice grin he told me that is why batch normalization exists. In[4]:= nn = NetChain[{64, Ramp, 10, SoftmaxLayer[]}, "Output" -> NetDecoder[{"Class", Range[0, 9]}], "Input" -> 2828]; nnn = NetChain[{BatchNormalizationLayer[], 64, Ramp, 10, SoftmaxLayer[]}, "Output" -> NetDecoder[{"Class", Range[0, 9]}], "Input" -> 2828]; AbsoluteTiming[ itrain3 = (1. - Flatten@ImageData@First@#) -> Last@# & /@ trainingData; itest3 = (1. - Flatten@ImageData@First@#) -> Last@# & /@ testData; ] AbsoluteTiming[ itrain2 = (Flatten@ImageData@First@#) -> Last@# & /@ trainingData; itest2 = (Flatten@ImageData@First@#) -> Last@# & /@ testData; ] Out[6]= {1.10475, Null} Out[7]= {0.812343, Null} In[8]:= tr2 = NetTrain[nn, itrain2, All, BatchSize -> 500, MaxTrainingRounds -> 20, TargetDevice -> "GPU"]; tr2["TotalTrainingTime"] ClassifierMeasurements[tr2["TrainedNet"], itest2, "Accuracy"] Out[9]= 10.1913 Out[10]= 0.9352 In[11]:= tr2n = NetTrain[nnn, itrain2, All, BatchSize -> 500, MaxTrainingRounds -> 20, TargetDevice -> "GPU"]; tr2n["TotalTrainingTime"] ClassifierMeasurements[tr2n["TrainedNet"], itest2, "Accuracy"] Out[12]= 10.2112 Out[13]= 0.9709 In[14]:= tr3 = NetTrain[nn, itrain3, All, BatchSize -> 500, MaxTrainingRounds -> 20, TargetDevice -> "GPU"]; tr3["TotalTrainingTime"] ClassifierMeasurements[tr3["TrainedNet"], itest3, "Accuracy"] Out[15]= 10.0675 Out[16]= 0.9732 In[17]:= tr3n = NetTrain[nnn, itrain3, All, BatchSize -> 500, MaxTrainingRounds -> 20, TargetDevice -> "GPU"]; tr3n["TotalTrainingTime"] ClassifierMeasurements[tr3n["TrainedNet"], itest3, "Accuracy"] Out[18]= 10.2946 Out[19]= 0.9734

I talked to someone with more knowledge on this topic than i have and after a nice grin he told me that is why batch normalization exists.

In[4]:= nn = 
  NetChain[{64, Ramp, 10, SoftmaxLayer[]}, 
   "Output" -> NetDecoder[{"Class", Range[0, 9]}], "Input" -> 28*28];
nnn = NetChain[{BatchNormalizationLayer[], 64, Ramp, 10, 
    SoftmaxLayer[]}, "Output" -> NetDecoder[{"Class", Range[0, 9]}], 
   "Input" -> 28*28];

AbsoluteTiming[
 itrain3 = (1. - Flatten@ImageData@First@#) -> Last@# & /@ 
   trainingData;
 itest3 = (1. - Flatten@ImageData@First@#) -> Last@# & /@ testData;
 ]

AbsoluteTiming[
 itrain2 = (Flatten@ImageData@First@#) -> Last@# & /@ trainingData;
 itest2 = (Flatten@ImageData@First@#) -> Last@# & /@ testData;
 ]

Out[6]= {1.10475, Null}

Out[7]= {0.812343, Null}

In[8]:= tr2 = 
  NetTrain[nn, itrain2, All, BatchSize -> 500, 
   MaxTrainingRounds -> 20, TargetDevice -> "GPU"];
tr2["TotalTrainingTime"]
ClassifierMeasurements[tr2["TrainedNet"], itest2, "Accuracy"]

Out[9]= 10.1913

Out[10]= 0.9352

In[11]:= tr2n = 
  NetTrain[nnn, itrain2, All, BatchSize -> 500, 
   MaxTrainingRounds -> 20, TargetDevice -> "GPU"];
tr2n["TotalTrainingTime"]
ClassifierMeasurements[tr2n["TrainedNet"], itest2, "Accuracy"]

Out[12]= 10.2112

Out[13]= 0.9709

In[14]:= tr3 = 
  NetTrain[nn, itrain3, All, BatchSize -> 500, 
   MaxTrainingRounds -> 20, TargetDevice -> "GPU"];
tr3["TotalTrainingTime"]
ClassifierMeasurements[tr3["TrainedNet"], itest3, "Accuracy"]

Out[15]= 10.0675

Out[16]= 0.9732

In[17]:= tr3n = 
  NetTrain[nnn, itrain3, All, BatchSize -> 500, 
   MaxTrainingRounds -> 20, TargetDevice -> "GPU"];
tr3n["TotalTrainingTime"]
ClassifierMeasurements[tr3n["TrainedNet"], itest3, "Accuracy"]

Out[18]= 10.2946

Out[19]= 0.9734

POSTED BY: Martijn Froeling