Message Boards Message Boards

1
|
3707 Views
|
1 Reply
|
1 Total Likes
View groups...
Share
Share this post:

Can BatchNormalizationLayer be used without Gamma and Beta?

Posted 5 years ago

Is there a way to use BatchNormalizationLayer without having it learn its Gamma and Beta parameters? I'd like to have a BN learn its MovingMean and MovingVariance parameters during training, but leave its Gamma and Beta alone at 1 and 0 respectively. If during NetTrain I set the LearningRateMultipliers for either Gamma or Beta to None, then a BN still learns both its MovingMean and MovingVariance, as I expect. However, if I set the LearningRateMultipliers for both Gamma and Beta to None, then MovingMean and MovingVariance don't get learned. Instead, they maintain their initial values of 0 and 1 respectively. (I also tried setting the BNLayer's "Gamma"->None and "Beta"->None in analogy to the way ConvLayers let you specify "Biases"->None, but this gives an error message.)

I'm still learning about neural networks, so I might be thinking about something wrong, but here's the reason I'm trying to do this:

The original Batch Normalization paper (Ioffe, Szegedy) placed the BN after the linear layer / convolution, but there has been discussion since then about placing the BN before the convolution (e.g., on reddit). My thinking is that if the BN comes before a convolution, then the conv can learn any needed scaling and bias, so the BN shouldn't need to scale or bias its own output, i.e., the BN can just leave its own Gamma=1 and Beta=0, and the conv can learn to expect zero-mean and unit-variance input. But I'd still want the BN to learn its activation statistics during training in this case. I.e., it should still learn its MovingMean and MovingVariance parameters so that the input statistics to the conv layer will remain stable.

Does anyone have any thoughts? Thanks

POSTED BY: Brad Chalfan
Posted 5 years ago

Update: I'm thinking this might be a bug in BatchNormalizationLayer?

This problem only seems to happen for BN layers that occur in a network where they aren't preceded by any other layers with learnable parameters. This was the case in the simple network I was using for testing when I wrote the original post above.

A workaround that seems to work is to set the learning rate for BN layers that should have frozen Gammas & Betas to a small value rather than setting it to None. Here's some code that implements and tests this workaround:

Module[
 {trainingData, net, trainedNone, trainedSmall},

 (* fake some training data *)

 trainingData = 
  Table[RandomImage[1, {16, 9}, ColorSpace -> "RGB"] -> 
    RandomImage[1, {16, 9}, ColorSpace -> "RGB"], 256];

 (* create the network - just a batch norm followed by a conv *)

 net = NetInitialize[NetGraph[
    <|"BN" -> BatchNormalizationLayer[], 
     "*" -> ConvolutionLayer[3, 1]|>,
    {"BN" -> "*"},
    "Input" -> NetEncoder[{"Image", {16, 9}, ColorSpace -> "RGB"}],
    "Output" -> NetDecoder[{"Image", ColorSpace -> "RGB"}]
    ]];

 (* train with BN's learning rate = None *)

 trainedNone = NetTrain[net, trainingData, TimeGoal -> 5,
   LearningRateMultipliers -> {{"BN", "Gamma"} -> 
      None, {"BN", "Beta"} -> None}];

 (* train with BN's learning rate = very small *)

 trainedSmall = NetTrain[net, trainingData, TimeGoal -> 5,
   LearningRateMultipliers -> {{"BN", "Gamma"} -> 
      None, {"BN", "Beta"} -> $MachineEpsilon}];

 (* format results *)
 Dataset[<|
   "Gamma" -> <|
     "Initialized" -> net[["BN", "Gamma"]],
     "Trained: LR=None" -> trainedNone[["BN", "Gamma"]],
     "Trained: LR=$MachineEpsilon" -> trainedSmall[["BN", "Gamma"]],
     "Expected" -> 1.
     |>,
   "Beta" -> <|
     "Initialized" -> net[["BN", "Beta"]],
     "Trained: LR=None" -> trainedNone[["BN", "Beta"]],
     "Trained: LR=$MachineEpsilon" -> trainedSmall[["BN", "Beta"]],
     "Expected" -> 0.
     |>,
   "MovingMean" -> <|
     "Initialized" -> net[["BN", "MovingMean"]],
     "Trained: LR=None" -> trainedNone[["BN", "MovingMean"]],
     "Trained: LR=$MachineEpsilon" -> 
      trainedSmall[["BN", "MovingMean"]],
     "Expected" -> Mean[UniformDistribution[{0., 1.}]]
     |>,
   "MovingVariance" -> <|
     "Initialized" -> net[["BN", "MovingVariance"]],
     "Trained: LR=None" -> trainedNone[["BN", "MovingVariance"]],
     "Trained: LR=$MachineEpsilon" -> 
      trainedSmall[["BN", "MovingVariance"]],
     "Expected" -> Variance[UniformDistribution[{0., 1.}]]
     |>
   |>]
 ]

The output of this is:

test output

POSTED BY: Brad Chalfan
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract