Group Abstract Group Abstract

Message Boards Message Boards

Automatically sliding a conv net onto a larger image

POSTED BY: Matthias Odisio
11 Replies

enter image description here - Congratulations! This post is now a Staff Pick as distinguished by a badge on your profile! Thank you, keep it coming!

POSTED BY: EDITORIAL BOARD

Thanks, this is an acceptable workaround.

May I ask how you derive this formula, (61+16*n)?

By the way, a set of parenthesis is missing in ref/ConvolutionLayer's notes for the output size formula. The Property example gives the correct result.

And, I take good note that "the future will be better."

POSTED BY: Matthias Odisio

May I ask how you derive this formula, (61+16*n)?

The output length of a convolutional or pooling layer, for a given size of kernel and stride, depending on the input length is this function:

layerOutputLength[kernel_, stride_][inputLength_] := (inputLength - kernel)/stride + 1;

By inverting this, you get the input length depending on the output length:

layerInputLength[kernel_, stride_][outputLength_] := stride * (outputLength - 1) + kernel;

There is 4 times layers with kernel size 5 and stride 2 in your auto-encoder. So the input size corresponding to a length 1 in the upper level (where the image dimension is the smallest) is obtained by computing 4 times inputLength[5, 2] starting from outputLength= 1:

netInputLength[outputLength_] := Nest[
    layerInputLength[5, 2],
    outputLength,
    4
];

netInputLength[1]
Out[4]= 61

This is the minimal input length, so that nothing is lost, and the length is 1 in the "most narrow part" of the network.

Then the "global stride" is just the multiplication of all the strides, so 2^4 = 16.

This value of the global stride can be check by looking how bigger the input length must be to produce a "most narrow part" of length +1:

netInputLength[2] - netInputLength[1]
netInputLength[3] - netInputLength[2]
netInputLength[4] - netInputLength[3]
Out[5]= 16
Out[6]= 16
Out[7]= 16

You can check all these equations by drawing what happens on a piece of paper =)

Thanks for taking the time to elaborate these details. Merci !

POSTED BY: Matthias Odisio

Cool, auto-encoder with a U-shape!

So here the problem is different. The thing is that there are some constraints on the input size to be able to match the same size after going through deconvolutions. There cannot be any border effect.

For instance if you give an input of 6, to a kernel of 5 with stride 2, you are going to lose one input, that's what I call border effect. Our framework allows this. But here, you really need to reconstitute the same size after deconvolutions, which you cannot do if you throw away some input features.

So your input has to be of size 61 + n * 16 where n is positive or null.

And you can see that the construction of "ddae" fails if you use a size that does not satisfy this constraint (such as 1570). It's not a problem from changing the dimensions.

So try it with images with size equal to 157 modulo 16 (and not lower than 61)!

Thanks for this elaboration, Jerome.

Are those constraints brought by an underlying third party implementation? It feels like a bug that ConvolutionLayer and DeconvolutionLayer do not interplay well. I know the documentation does not claim otherwise. From my end-user's perspective those so-called "border effects" should be taken care of by the framework.

POSTED BY: Matthias Odisio

Thanks Jerome. So it's not feasible to realistically reduce the stride.

I hope this similar topic will also interest you. What about sliding this denoising autoencoder?

size = 157;

n1 = 32;
k = 5;

conv2[n_] := 
  NetChain[{ConvolutionLayer[n, k, "Stride" -> 2], 
    BatchNormalizationLayer[], ElementwiseLayer["ReLU"], 
    DropoutLayer[], ConvolutionLayer[n, k, "Stride" -> 2], 
    BatchNormalizationLayer[], ElementwiseLayer["ReLU"]}];

deconv2[n_] := 
  NetChain[{DeconvolutionLayer[n, k, "Stride" -> 2], 
    BatchNormalizationLayer[], ElementwiseLayer["ReLU"], 
    DropoutLayer[], DeconvolutionLayer[n/2, k, "Stride" -> 2], 
    BatchNormalizationLayer[], ElementwiseLayer["SoftSign"]}];

sum[] := NetChain[{TotalLayer["Inputs" -> 2]}];

constantPowerLayer[] := NetChain[{
   ElementwiseLayer[Log@Clip[#, {$MachineEpsilon, 1}] &],
   ConvolutionLayer[1, 1, "Biases" -> None, "Weights" -> {{{{1}}}}],
   ElementwiseLayer[Exp]}]

ddae = NetGraph[
  <|
   "bugworkaround" -> ElementwiseLayer[# &],
   "c12" -> conv2[n1],
   "c34" -> conv2[2*n1],

   "d12" -> deconv2[2*n1],
   "d34" -> 
    NetChain[{DeconvolutionLayer[n1, k, "Stride" -> 2], 
      BatchNormalizationLayer[], ElementwiseLayer["ReLU"], 
      DeconvolutionLayer[1, k, "Stride" -> 2], 
      BatchNormalizationLayer[], ElementwiseLayer["SoftSign"]}],

   "sum1" -> sum[],
   "sum2" -> NetChain[{sum[], constantPowerLayer[]}],

   "loss" -> MeanSquaredLossLayer[]
   |>,
  {
   "bugworkaround" -> 
    "c12" -> 
     "c34" -> 
      "d12" -> "sum1" -> "d34" -> "sum2" -> NetPort["loss", "Input"],
   "bugworkaround" -> "sum2",
   NetPort["Noisy"] -> "bugworkaround",
   "c12" -> "sum1",
   NetPort["Target"] -> NetPort["loss", "Target"]
   },
  "Noisy" -> 
   NetEncoder[{"Image", {size, size}, ColorSpace -> "Grayscale"}],
  "Target" -> 
   NetEncoder[{"Image", {size, size}, ColorSpace -> "Grayscale"}]
  ]

trained = NetTake[NetInitialize@ddae, {"bugworkaround", "sum2"}]

Now I "automatize" the input dimensions:

n3 = NetReplacePart[trained, "Noisy" -> Automatic];

This new network works fine if given same dimensions, but fails with larger input dimensions. Any idea how to fix this problem?

In[142]:= n3[RandomReal[1, {1, 157, 157}]] // Dimensions

Out[142]= {1, 157, 157}

In[143]:= n3[RandomReal[1, {1, 1570, 1570}]] // Dimensions

During evaluation of In[143]:= NetGraph::tyfail1: Inferred inconsistent value for output size of layer 4 of layer "d34".

Out[143]= {}
POSTED BY: Matthias Odisio

We should probably advertise

NetReplacePart[net, "Input" -> Automatic]

For LeNet you can do

NetReplacePart[
    NetDrop[
        NetModel["LeNet Trained on MNIST Data", EvaluationNet"],
        -5
    ],
"Input" -> Automatic]

and get

no size net

Ah yes. Thanks! This is used in my follow up question.

POSTED BY: Matthias Odisio

Hi Matthias,

The stride of 4 comes from the pooling layers

In[49]:= Map[NetExtract[n3, {#, "Stride"}] &, {3, 6}]
Out[49]= {{2, 2}, {2, 2}}

(then on each dimension, there is an "implicit" stride of 2 x 2 = 4)

You can easily make this stride bigger (by a multiplicative factor), by setting a stride > 1 in the top convolutional layer for example.

But reducing the stride is a bit "awkward". You could have a stride of 1 by setting the stride to 1 in both pooling layers (that is to say to remove them...). Then the model is not the same : if you remove (or change) one pooling layer, you "invalidate" the weights that are learned after.

So the only solution i see if you REALLY want a stride of 1 is running the same network 16 times (!) and interleaving the results to reconstitute the output. You can save some computation by not recomputing what comes before the first pooling layer (i.e. the first convolution and it's non-linearity). If you have fixed-size images, there is a way to put everything into a unique network, sharing layers with NetInsertSharedArrays, and using PartLayer to shift the image representations when you need to.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard