I am designing a relatively large CNN using many convolution layers with residual blocks. My attempt to train this network always fails with
NetTrain::interr2: An unknown internal error occurred. Consult Internal`$LastInternalFailure for potential information.
where
MXNetError: Check failed: assign(&dattr, vec.at(i)): Incompatible attr
in node _copyto at 0-th output: expected [8], got [8,1,7]
However NetInitialize
and evaluation always works well.
I am relatively new to neural networks, it could be, that I missed something in the documentation.
There is a correlation, if I use more than 2 residual blocks it start to fail. My example
building blocks (tested)
conv[channelsIn_, channelsOut_, length_: 1024] :=
ConvolutionLayer[channelsOut, {7}, "Input" -> {channelsIn, length}, PaddingSize -> 3];
batchnorm[channelsIn_, length_: 1024] :=
BatchNormalizationLayer["Input" -> {channelsIn, length}];
relu[channelsIn_, length_: 1024] :=
ElementwiseLayer["ReLU", "Input" -> {channelsIn, length}];
residual[channelsIn_, length_: 1024] :=
NetGraph[
{
LinearLayer[{channelsIn, length}],
ElementwiseLayer[Ramp, "Input" -> {channelsIn, length}],
ThreadingLayer[Plus, "Output" -> {channelsIn, length}, InputPorts -> 2]
},
{
1 -> 2,
{NetPort["Input"], 2} -> 3
},
"Input" -> {channelsIn, length}
];
now assemble NetGraph
, which causes the problem
net = NetGraph[
<|
"inputBlock" -> NetChain[
{
conv[1, 8, 1024],
batchnorm[8, 1024],
relu[8, 1024],
residual[8, 1024], (* <---- ok *)
conv[8, 8, 1024],
batchnorm[8, 1024],
relu[8, 1024]
},
"Input" -> {1, 1024},
"Output" -> {8, 1024}
],
"pooling1" -> PoolingLayer[
{513},
"Input" -> {8, 1024},
"Output" -> {8, 512}
],
"internalBlock11" -> NetChain[
{
conv[8, 16, 512],
batchnorm[16, 512],
relu[16, 512],
(* residual[16, 512], *) (* <---- fails if it is included *)
conv[16, 16, 512],
batchnorm[16, 512],
relu[16, 512]
},
"Input" -> {8, 512},
"Output" -> {16, 512}
],
"internalBlock12" -> NetChain[
{
conv[16, 16, 512],
batchnorm[16, 512],
relu[16, 512],
(* residual[16, 512], *) (* <---- fails if it is included *)
conv[16, 8, 512],
batchnorm[8, 512],
relu[8, 512]
},
"Input" -> {16, 512},
"Output" -> {8, 512}
],
"unconv1" -> DeconvolutionLayer[
8,
{513},
"Input" -> {8, 512},
PaddingSize -> 0,
"Output" -> {8, 1024}
],
"concat" -> ThreadingLayer[
Plus,
InputPorts -> 2,
"Output" -> {8, 1024}
],
"outputBlock" -> NetChain[
{
conv[8, 8, 1024],
batchnorm[8, 1024],
relu[8, 1024],
residual[8, 1024], (* <---- ok *)
conv[8, 1, 1024],
batchnorm[1, 1024],
relu[1, 1024]
},
"Input" -> {8, 1024},
"Output" -> {1, 1024}
]
|>,
{
NetPort["Input"] -> "inputBlock",
"inputBlock" -> "pooling1",
"pooling1" -> "internalBlock11",
"internalBlock11" -> "internalBlock12",
"internalBlock12" -> "unconv1",
{"unconv1", "inputBlock"} -> "concat",
"concat" -> "outputBlock",
"outputBlock" -> NetPort["Output"]
}
];
test training
NetTrain[net, {{Table[RandomReal[], {i,1024}]} -> {Table[RandomReal[], {i,1024}]}}]
I could not reproduce this error on a smaller scale networks. Hope anyone has seen something similar.
Thanks