Message Boards

WOLFRAM COMMUNITY

12711 Views

11 Replies

12 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Staff Picks Image Processing Curated Data Wolfram Language Machine Learning Neural Networks

Automatically sliding a conv net onto a larger image

Matthias Odisio

Matthias Odisio, Thermo Fisher Scientific

Posted 6 years ago

How to control the step size of the following conv net as it slides onto a larger image? See also: https://mathematica.stackexchange.com/questions/144060/sliding-fullyconvolutional-net-over-larger-images/148033 As a toy example, I'd like to slide a digit classifier trained on 28x28 images to classify each neighborhood of a larger image. This is lenet with linear layers replaced by 1x1 convolutional layers. trainingData = ResourceData["MNIST", "TrainingData"]; testData = ResourceData["MNIST", "TestData"]; lenetModel = NetModel["LeNet Trained on MNIST Data", "UninitializedEvaluationNet"]; newlenet = NetExtract[lenetModel, All]; newlenet[[7]] = ConvolutionLayer[500, {4, 4}]; newlenet[[8]] = ElementwiseLayer[Ramp]; newlenet[[9]] = ConvolutionLayer[10, 1]; newlenet[[10]] = SoftmaxLayer[1]; newlenet[[11]] = PartLayer[{All, 1, 1}]; newlenet = NetChain[newlenet, "Input" -> NetEncoder[{"Image", {28, 28}, ColorSpace -> "Grayscale"}]] Now train it: newtd = First@# -> UnitVector[10, Last@# + 1] & /@ trainingData; newvd = First@# -> UnitVector[10, Last@# + 1] & /@ testData; ng = NetGraph[ <\|"inference" -> newlenet, "loss" -> CrossEntropyLossLayer["Probabilities", "Input" -> 10] \|>, { "inference" -> NetPort["loss", "Input"], NetPort["Target"] -> NetPort["loss", "Target"] } ] tnew = NetTrain[ng, newtd, ValidationSet -> newvd, TargetDevice -> "GPU"] Now remove dimensions information (see stackexchange for the code definition of `removeInputInformation`): removeInputInformation[layer_ConvolutionLayer] := With[{k = NetExtract[layer, "OutputChannels"], kernelSize = NetExtract[layer, "KernelSize"], weights = NetExtract[layer, "Weights"], biases = NetExtract[layer, "Biases"], padding = NetExtract[layer, "PaddingSize"], stride = NetExtract[layer, "Stride"], dilation = NetExtract[layer, "Dilation"]}, ConvolutionLayer[k, kernelSize, "Weights" -> weights, "Biases" -> biases, "PaddingSize" -> padding, "Stride" -> stride, "Dilation" -> dilation]] removeInputInformation[layer_PoolingLayer] := With[{f = NetExtract[layer, "Function"], kernelSize = NetExtract[layer, "KernelSize"], padding = NetExtract[layer, "PaddingSize"], stride = NetExtract[layer, "Stride"]}, PoolingLayer[kernelSize, stride, "PaddingSize" -> padding, "Function" -> f]] removeInputInformation[layer_ElementwiseLayer] := With[{f = NetExtract[layer, "Function"]}, ElementwiseLayer[f]] removeInputInformation[x_] := x tmp = NetExtract[NetExtract[tnew, "inference"], All]; n3 = removeInputInformation /@ tmp[[1 ;; -3]]; AppendTo[n3, SoftmaxLayer[1]]; n3 = NetChain@n3; And the network `n3` slides onto any larger input. However, note that it seems to slide with steps of 4. How could I make it take steps of 1 instead? In[358]:= n3[RandomReal[1, {1, 2810, 28}]] // Dimensions Out[358]= {10, 64, 1} In[359]:= BlockMap[Length, Range[2810], 28, 4] // Length Out[359]= 64

How to control the step size of the following conv net as it slides onto a larger image?

As a toy example, I'd like to slide a digit classifier trained on 28x28 images to classify each neighborhood of a larger image. This is lenet with linear layers replaced by 1x1 convolutional layers.

trainingData = ResourceData["MNIST", "TrainingData"];
testData = ResourceData["MNIST", "TestData"];

lenetModel = 
  NetModel["LeNet Trained on MNIST Data", 
   "UninitializedEvaluationNet"];

newlenet = NetExtract[lenetModel, All];
newlenet[[7]] = ConvolutionLayer[500, {4, 4}];
newlenet[[8]] = ElementwiseLayer[Ramp];
newlenet[[9]] = ConvolutionLayer[10, 1];
newlenet[[10]] = SoftmaxLayer[1];
newlenet[[11]] = PartLayer[{All, 1, 1}];

newlenet = 
 NetChain[newlenet, 
  "Input" -> 
   NetEncoder[{"Image", {28, 28}, ColorSpace -> "Grayscale"}]]

Now train it:

newtd = First@# -> UnitVector[10, Last@# + 1] & /@ trainingData;
newvd = First@# -> UnitVector[10, Last@# + 1] & /@ testData;

ng = NetGraph[
  <|"inference" -> newlenet,
   "loss" -> CrossEntropyLossLayer["Probabilities", "Input" -> 10]
   |>,
  {
   "inference" -> NetPort["loss", "Input"],
   NetPort["Target"] -> NetPort["loss", "Target"]
   }
  ]
tnew = NetTrain[ng, newtd, ValidationSet -> newvd, 
  TargetDevice -> "GPU"]

Now remove dimensions information (see stackexchange for the code definition of removeInputInformation):

removeInputInformation[layer_ConvolutionLayer] := 
 With[{k = NetExtract[layer, "OutputChannels"], 
   kernelSize = NetExtract[layer, "KernelSize"], 
   weights = NetExtract[layer, "Weights"], 
   biases = NetExtract[layer, "Biases"], 
   padding = NetExtract[layer, "PaddingSize"], 
   stride = NetExtract[layer, "Stride"], 
   dilation = NetExtract[layer, "Dilation"]}, 
  ConvolutionLayer[k, kernelSize, "Weights" -> weights, 
   "Biases" -> biases, "PaddingSize" -> padding, "Stride" -> stride, 
   "Dilation" -> dilation]]

removeInputInformation[layer_PoolingLayer] := 
 With[{f = NetExtract[layer, "Function"], 
   kernelSize = NetExtract[layer, "KernelSize"], 
   padding = NetExtract[layer, "PaddingSize"], 
   stride = NetExtract[layer, "Stride"]}, 
  PoolingLayer[kernelSize, stride, "PaddingSize" -> padding, 
   "Function" -> f]]

removeInputInformation[layer_ElementwiseLayer] := 
 With[{f = NetExtract[layer, "Function"]}, ElementwiseLayer[f]]

removeInputInformation[x_] := x

tmp = NetExtract[NetExtract[tnew, "inference"], All];
n3 = removeInputInformation /@ tmp[[1 ;; -3]];
AppendTo[n3, SoftmaxLayer[1]];
n3 = NetChain@n3;

And the network n3 slides onto any larger input. However, note that it seems to slide with steps of 4. How could I make it take steps of 1 instead?

In[358]:= n3[RandomReal[1, {1, 28*10, 28}]] // Dimensions

Out[358]= {10, 64, 1}

In[359]:= BlockMap[Length, Range[28*10], 28, 4] // Length

Out[359]= 64

POSTED BY: Matthias Odisio

11 Replies

Sort By:

Moderation Team

Moderation Team, WOLFRAM

Posted 6 years ago

- Congratulations! This post is now a Staff Pick as distinguished by a badge on your profile! Thank you, keep it coming!

POSTED BY: Moderation Team

Jérôme Louradour

Jérôme Louradour, Wolfram Research

Posted 6 years ago

Indeed for this auto-encoder use case, some "smart" padding when needed could solve the issue you have with auto-encoders. The support to more forms of padding is in the pipeline to improve the WL framework. We already unlocked some things around padding in the last version. There is a great chance that we will offer support for automatic padding to a constraint of type "m + k * n" in the next version, or another user-friendly solution for efficient support for multi dynamic dimensions. Waiting for this, you can use the cheap solution of padding input images to fit size `61 + n . 16`. It can be done using `PaddingLayer`, for example. You need to prepend this layer to the network for a given image size. It's bit awkward, but you will have no overhead with respect to the current situation, where the size inference and the unrolling of the net is done at top-level each time you apply the network. Again, we will improve how things work for multiple variable dimensions in the next versions.

POSTED BY: Jérôme Louradour

Matthias Odisio

Matthias Odisio, Thermo Fisher Scientific

Posted 6 years ago

Thanks, this is an acceptable workaround. May I ask how you derive this formula, `(61+16*n)`? By the way, a set of parenthesis is missing in `ref/ConvolutionLayer`'s notes for the output size formula. The Property example gives the correct result. And, I take good note that "the future will be better."

POSTED BY: Matthias Odisio

Jérôme Louradour

Jérôme Louradour, Wolfram Research

Posted 6 years ago

May I ask how you derive this formula, (61+16n)? The output length of a convolutional or pooling layer, for a given size of kernel and stride, depending on the input length is this function: layerOutputLength[kernel_, stride_][inputLength_] := (inputLength - kernel)/stride + 1; By inverting this, you get the input length depending on the output length: layerInputLength[kernel_, stride_][outputLength_] := stride (outputLength - 1) + kernel; There is 4 times layers with kernel size 5 and stride 2 in your auto-encoder. So the input size corresponding to a length 1 in the upper level (where the image dimension is the smallest) is obtained by computing 4 times `inputLength[5, 2]` starting from `outputLength= 1`: netInputLength[outputLength_] := Nest[ layerInputLength[5, 2], outputLength, 4 ]; netInputLength[1] Out[4]= 61 This is the minimal input length, so that nothing is lost, and the length is 1 in the "most narrow part" of the network. Then the "global stride" is just the multiplication of all the strides, so `2^4 = 16`. This value of the global stride can be check by looking how bigger the input length must be to produce a "most narrow part" of length +1: netInputLength[2] - netInputLength[1] netInputLength[3] - netInputLength[2] netInputLength[4] - netInputLength[3] Out[5]= 16 Out[6]= 16 Out[7]= 16 You can check all these equations by drawing what happens on a piece of paper =)

POSTED BY: Jérôme Louradour

Matthias Odisio

Matthias Odisio, Thermo Fisher Scientific

Posted 6 years ago

Thanks for taking the time to elaborate these details. Merci !

POSTED BY: Matthias Odisio

Jérôme Louradour

Jérôme Louradour, Wolfram Research

Posted 6 years ago

Cool, auto-encoder with a U-shape! So here the problem is different. The thing is that there are some constraints on the input size to be able to match the same size after going through deconvolutions. There cannot be any border effect. For instance if you give an input of 6, to a kernel of 5 with stride 2, you are going to lose one input, that's what I call border effect. Our framework allows this. But here, you really need to reconstitute the same size after deconvolutions, which you cannot do if you throw away some input features. So your input has to be of size `61 + n * 16` where n is positive or null. And you can see that the construction of "ddae" fails if you use a size that does not satisfy this constraint (such as 1570). It's not a problem from changing the dimensions. So try it with images with size equal to 157 modulo 16 (and not lower than 61)!

POSTED BY: Jérôme Louradour

Matthias Odisio

Matthias Odisio, Thermo Fisher Scientific

Posted 6 years ago

Thanks for this elaboration, Jerome. Are those constraints brought by an underlying third party implementation? It feels like a bug that `ConvolutionLayer` and `DeconvolutionLayer` do not interplay well. I know the documentation does not claim otherwise. From my end-user's perspective those so-called "border effects" should be taken care of by the framework.

POSTED BY: Matthias Odisio

Matthias Odisio

Matthias Odisio, Thermo Fisher Scientific

Posted 6 years ago

Thanks Jerome. So it's not feasible to realistically reduce the stride. I hope this similar topic will also interest you. What about sliding this denoising autoencoder? size = 157; n1 = 32; k = 5; conv2[n_] := NetChain[{ConvolutionLayer[n, k, "Stride" -> 2], BatchNormalizationLayer[], ElementwiseLayer["ReLU"], DropoutLayer[], ConvolutionLayer[n, k, "Stride" -> 2], BatchNormalizationLayer[], ElementwiseLayer["ReLU"]}]; deconv2[n_] := NetChain[{DeconvolutionLayer[n, k, "Stride" -> 2], BatchNormalizationLayer[], ElementwiseLayer["ReLU"], DropoutLayer[], DeconvolutionLayer[n/2, k, "Stride" -> 2], BatchNormalizationLayer[], ElementwiseLayer["SoftSign"]}]; sum[] := NetChain[{TotalLayer["Inputs" -> 2]}]; constantPowerLayer[] := NetChain[{ ElementwiseLayer[Log@Clip[#, {$MachineEpsilon, 1}] &], ConvolutionLayer[1, 1, "Biases" -> None, "Weights" -> {{{{1}}}}], ElementwiseLayer[Exp]}] ddae = NetGraph[ <\| "bugworkaround" -> ElementwiseLayer[# &], "c12" -> conv2[n1], "c34" -> conv2[2n1], "d12" -> deconv2[2n1], "d34" -> NetChain[{DeconvolutionLayer[n1, k, "Stride" -> 2], BatchNormalizationLayer[], ElementwiseLayer["ReLU"], DeconvolutionLayer[1, k, "Stride" -> 2], BatchNormalizationLayer[], ElementwiseLayer["SoftSign"]}], "sum1" -> sum[], "sum2" -> NetChain[{sum[], constantPowerLayer[]}], "loss" -> MeanSquaredLossLayer[] \|>, { "bugworkaround" -> "c12" -> "c34" -> "d12" -> "sum1" -> "d34" -> "sum2" -> NetPort["loss", "Input"], "bugworkaround" -> "sum2", NetPort["Noisy"] -> "bugworkaround", "c12" -> "sum1", NetPort["Target"] -> NetPort["loss", "Target"] }, "Noisy" -> NetEncoder[{"Image", {size, size}, ColorSpace -> "Grayscale"}], "Target" -> NetEncoder[{"Image", {size, size}, ColorSpace -> "Grayscale"}] ] trained = NetTake[NetInitialize@ddae, {"bugworkaround", "sum2"}] Now I "automatize" the input dimensions: n3 = NetReplacePart[trained, "Noisy" -> Automatic]; This new network works fine if given same dimensions, but fails with larger input dimensions. Any idea how to fix this problem? In[142]:= n3[RandomReal[1, {1, 157, 157}]] // Dimensions Out[142]= {1, 157, 157} In[143]:= n3[RandomReal[1, {1, 1570, 1570}]] // Dimensions During evaluation of In[143]:= NetGraph::tyfail1: Inferred inconsistent value for output size of layer 4 of layer "d34". Out[143]= {}

Thanks Jerome. So it's not feasible to realistically reduce the stride.

I hope this similar topic will also interest you. What about sliding this denoising autoencoder?

size = 157;

n1 = 32;
k = 5;

conv2[n_] := 
  NetChain[{ConvolutionLayer[n, k, "Stride" -> 2], 
    BatchNormalizationLayer[], ElementwiseLayer["ReLU"], 
    DropoutLayer[], ConvolutionLayer[n, k, "Stride" -> 2], 
    BatchNormalizationLayer[], ElementwiseLayer["ReLU"]}];

deconv2[n_] := 
  NetChain[{DeconvolutionLayer[n, k, "Stride" -> 2], 
    BatchNormalizationLayer[], ElementwiseLayer["ReLU"], 
    DropoutLayer[], DeconvolutionLayer[n/2, k, "Stride" -> 2], 
    BatchNormalizationLayer[], ElementwiseLayer["SoftSign"]}];

sum[] := NetChain[{TotalLayer["Inputs" -> 2]}];

constantPowerLayer[] := NetChain[{
   ElementwiseLayer[Log@Clip[#, {$MachineEpsilon, 1}] &],
   ConvolutionLayer[1, 1, "Biases" -> None, "Weights" -> {{{{1}}}}],
   ElementwiseLayer[Exp]}]

ddae = NetGraph[
  <|
   "bugworkaround" -> ElementwiseLayer[# &],
   "c12" -> conv2[n1],
   "c34" -> conv2[2*n1],

   "d12" -> deconv2[2*n1],
   "d34" -> 
    NetChain[{DeconvolutionLayer[n1, k, "Stride" -> 2], 
      BatchNormalizationLayer[], ElementwiseLayer["ReLU"], 
      DeconvolutionLayer[1, k, "Stride" -> 2], 
      BatchNormalizationLayer[], ElementwiseLayer["SoftSign"]}],

   "sum1" -> sum[],
   "sum2" -> NetChain[{sum[], constantPowerLayer[]}],

   "loss" -> MeanSquaredLossLayer[]
   |>,
  {
   "bugworkaround" -> 
    "c12" -> 
     "c34" -> 
      "d12" -> "sum1" -> "d34" -> "sum2" -> NetPort["loss", "Input"],
   "bugworkaround" -> "sum2",
   NetPort["Noisy"] -> "bugworkaround",
   "c12" -> "sum1",
   NetPort["Target"] -> NetPort["loss", "Target"]
   },
  "Noisy" -> 
   NetEncoder[{"Image", {size, size}, ColorSpace -> "Grayscale"}],
  "Target" -> 
   NetEncoder[{"Image", {size, size}, ColorSpace -> "Grayscale"}]
  ]

trained = NetTake[NetInitialize@ddae, {"bugworkaround", "sum2"}]

Now I "automatize" the input dimensions:

n3 = NetReplacePart[trained, "Noisy" -> Automatic];

This new network works fine if given same dimensions, but fails with larger input dimensions. Any idea how to fix this problem?

In[142]:= n3[RandomReal[1, {1, 157, 157}]] // Dimensions

Out[142]= {1, 157, 157}

In[143]:= n3[RandomReal[1, {1, 1570, 1570}]] // Dimensions

During evaluation of In[143]:= NetGraph::tyfail1: Inferred inconsistent value for output size of layer 4 of layer "d34".

Out[143]= {}

POSTED BY: Matthias Odisio

Giulio Alessandrini

Giulio Alessandrini, Wolfram Research Inc.

Posted 6 years ago

We should probably advertise NetReplacePart[net, "Input" -> Automatic] For LeNet you can do NetReplacePart[ NetDrop[ NetModel["LeNet Trained on MNIST Data", EvaluationNet"], -5 ], "Input" -> Automatic] and get

POSTED BY: Giulio Alessandrini

Matthias Odisio

Matthias Odisio, Thermo Fisher Scientific

Posted 6 years ago

Ah yes. Thanks! This is used in my follow up question.

POSTED BY: Matthias Odisio

Jérôme Louradour

Jérôme Louradour, Wolfram Research

Posted 6 years ago

Hi Matthias, The stride of 4 comes from the pooling layers In[49]:= Map[NetExtract[n3, {#, "Stride"}] &, {3, 6}] Out[49]= {{2, 2}, {2, 2}} (then on each dimension, there is an "implicit" stride of 2 x 2 = 4) You can easily make this stride bigger (by a multiplicative factor), by setting a stride > 1 in the top convolutional layer for example. But reducing the stride is a bit "awkward". You could have a stride of 1 by setting the stride to 1 in both pooling layers (that is to say to remove them...). Then the model is not the same : if you remove (or change) one pooling layer, you "invalidate" the weights that are learned after. So the only solution i see if you REALLY want a stride of 1 is running the same network 16 times (!) and interleaving the results to reconstitute the output. You can save some computation by not recomputing what comes before the first pooling layer (i.e. the first convolution and it's non-linearity). If you have fixed-size images, there is a way to put everything into a unique network, sharing layers with `NetInsertSharedArrays`, and using `PartLayer` to shift the image representations when you need to.

POSTED BY: Jérôme Louradour

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Group Abstract

Feedback