Message Boards Message Boards

More options for ResizeLayer [a feature request/bug report]

Posted 4 years ago

Currently, Mathematica's ResizeLayer serves its main purpose, resizing images, fine. However, networks like U-Net require resizing not only input images, but also outputs of intermediate layers, with perfect pixel precision to match all scale levels. In the U-Net hosted by Wolfram Neural Net Repository this is achieved by DeconvolutionLayers. However, for other tasks, when the dataset isn't as large, one could drastically reduce parameter count, overtraining, and the checkerboard artifacts which deconvolution layers are infamous for, by replacing deconvolution layers with resize layers.

ResizeLayer[{Scaled[2],Scaled[2]}, Resampling->"Nearest"] 

works as expected, repeating each pixel 4 times, nothing special. However, a convolution layer is required afterwards to smooth the jagged edges produced by nearest-neighbor interpolation, thus beating the purpose. A much better alternative is the linear interpolation, and this is where the subtle problem lies.

ResizeLayer[{Scaled[2], Scaled[2]}, Resampling->"Linear"]

DOESN'T actually scale its input 2 times! Yes, the dimensions scale as expected, BUT the underlying image is upscaled by a fractional number close to 2, such that the leftmost (rightmost) pixel of the original image is exactly mapped to the leftmost (rightmost) pixel of its resized version. This is a nice, useful property, but to satisfy it, one has to sacrifice the exact matching between the pixel count scaling coefficient and the scaling coefficient of the underlying images encoded by those pixels.

The actual scaling coefficient of Scaled[2] in ResizeLayer turns out to be (2n-1)/(n-1):

Coefficient[InterpolatingPolynomial[{{1, 1}, {n, 2*n}}, i], i]//Simplify

Try this for a visual comparison:

img = Image@Table[Mod[i + j, 2], {i, 8}, {j, 8}]
correct = ImageResize[img, Scaled[2], Resampling -> "Linear"]; (*ImageResize respects the scaling coefficient, not the image borders. I like it.*)
wrong = ResizeLayer[{Scaled[2], Scaled[2]}, Resampling -> "Linear", "Input" -> NetEncoder[{"Image", ImageDimensions[img]}], "Output" -> NetDecoder["Image"]][img];
Image[correct, Magnification -> 10]
Image[wrong, Magnification -> 10]

You can see that the checkerboard pattern in the "wrong" image isn't aligned to the pixel grid, contrary to what is expected. This seemingly insignificant difference can noticeably degrade U-Net's performance for the following reasons:

  1. A half-pixel error on U-Net's first scale level becomes a whole pixel error on the second one, a 2-pixel error on the 3rd one and so on, up to the whole image on the deepest level.
  2. No sane amount of convolution layers on top of one imprecise resizing layer can fix its error, because the task is to slightly shift pixels on the left in one direction, and pixels on the right in another, but convolution layers are translation-invariant and thus unable to achieve this goal.

I thereby ask you to implement an option for ResizeLayer such that, with linear resampling, a user could choose between exact mapping of borders to borders (which is the current behavior) and exact scaling coefficient (which is required by U-Net variants and is currently impossible).

POSTED BY: Et Al
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract