Attention to all Wolfram Community Neural Net experts. Please help me understand this important part of implementing a 3D neural net.
I've recently been engaged in rewriting a previous GAN of mine in the WL to see if I can get better performance. I was surprised to find that the standard DeconvolutionLayer
does not have the same dimensional coverage as ConvolutionLayer
. There's no option on the DeconvolutionLayer
to enter a {h, w, d}
-sized kernel. My use is in the Architecture/Engineering field, however, this seems like an even larger oversight for others doing 3D medical image analysis (which is becoming more common place). Be that as it may, I'm now searching for a good workaround. I'm looking for something similar in functionality to Tensorflow2 Keras' Conv3DTranspose.
There seems to be very little writing online about this particular corner of Mathematica. There is this 2018 post by Martijn Froeling on the Wolfram Community forums which I've been using as reference. Instead of upscaling through a DeconvolutionLayer
that keeps the trainable parameters, there's quite a lot of working around with RescaleLayer
and ResizeLayer
which I don't understand.
I've tried my best to research all the pieces in the documentation, and parsed the code as best I could to understand it's flow, but larger questions still exist. Why does this lead to that? Why is this placed after that? There's a gap in my knowledge that I would like to learn about before implementing a variation of it in my project.
Below I've started annotating the code as a place to start, but there must be others who can add more detail. Until there's an actual implementation of a 3D convolution transpose in the WL, more information on this approach is important for the community.
(* Different implementation of 2D convolution layer *)
DeconvLayer2D[n_, {dimInx_, dimIny_}] :=
Block[{sc = 2},
NetChain[{DeconvolutionLayer[n, {sc, sc}, "Stride" -> {sc, sc},
"Input" -> {sc n, dimInx, dimIny}]}]
]
(* Unknown why the resize layer is structured with Scaled x2. \
The Convolutional layer is where the training parameters are kept? *)
ResizeLayer2D[n_, {dimInx_, dimIny_}] :=
Block[{sc = 2},
NetChain[{ResizeLayer[{Scaled[sc], Scaled[sc]},
"Input" -> {sc n, dimInx, dimIny}], ConvolutionLayer[n, 1]}]
]
(* I suspect that since 3D Transpose Convolution is not available, \
one must transpose the array twice to cover all {X,Y,Z} axis. Not \
sure where the training parameters are kept in this netchain either, \
probably the convolution layer at the end? *)
ResizeLayer3D[n_, {dimInx_, dimIny_, dimInz_}] :=
Block[{sc = 2},
NetChain[{
FlattenLayer[1, "Input" -> {n sc, dimInx, dimIny, dimInz}],
ResizeLayer[{Scaled[sc], Scaled[sc]}],
ReshapeLayer[{n sc, dimInx, sc dimIny, sc dimInz}],
TransposeLayer[2 <-> 3],
FlattenLayer[1],
ResizeLayer[{Scaled[sc], Scaled[1]}],
ReshapeLayer[{n sc, sc dimIny, sc dimInx, sc dimInz}],
TransposeLayer[2 <-> 3],
ConvolutionLayer[n, 1]}
]
]
{DeconvLayer2D[16, {2, 4}], ResizeLayer2D[16, {2, 4}], ResizeLayer3D[16, {2, 4, 6}]}
Implementation
Let's say I want to do 3D convolutions over a random array of noise. How would I go about implementing the above? It's beyond my previous neural nets in WL and Python.
noise = RandomChoice[{0.98, 0.02} -> {0, 1}, {25, 25, 25}];
image3D[noise]
Feature Request
Finally I draw the community's attention to the need for symmetry between ConvolutionLayer
and DeconvolutionLayer
similar to Tensorflow 2. For a framework to suggest it's industry-leading and easy-to-use, but then requires the above code tangle instead of a readable and predictable function is contradictory. I hope this functionality is considered for inclusion in future updates to WL neural nets. It really could be lifesaving.
(Additional information is available on the original & unanswered Mathematica Stack Exchange question.)