I have tried to train your internal block and I'm getting a different error (on Linux):
conv[channelsIn_, channelsOut_, length_ : 1024] :=
ConvolutionLayer[channelsOut, {7}, "Input" -> {channelsIn, length},
PaddingSize -> 3];
batchnorm[channelsIn_, length_ : 1024] :=
BatchNormalizationLayer["Input" -> {channelsIn, length}];
relu[channelsIn_, length_ : 1024] :=
ElementwiseLayer["ReLU", "Input" -> {channelsIn, length}];
residual[channelsIn_, length_ : 1024] :=
NetGraph[{LinearLayer[{channelsIn, length}],
ElementwiseLayer[Ramp, "Input" -> {channelsIn, length}],
ThreadingLayer[Plus, "Output" -> {channelsIn, length},
InputPorts -> 2]}, {1 -> 2, {NetPort["Input"], 2} -> 3},
"Input" -> {channelsIn, length}];
net = NetFlatten@NetChain[
{
conv[8, 16, 512],
batchnorm[16, 512],
relu[16, 512],
residual[16, 512],
conv[16, 16, 512],
batchnorm[16, 512],
relu[16, 512]
},
"Input" -> {8, 512},
"Output" -> {16, 512}
]
NetTrain[net,
RandomReal[1, {100, 8, 512}] -> RandomReal[1, {100, 16, 512}]]
Error is
MXNetError: Check failed: !is_view:
Regardless of the error, these are MXNet bugs and unfortunately we have to live with them until we migrate the framework to a different backend, whose timeframe is very long (hard to predict, maybe around 1-2 years).
As a workaround, I could train the net normally by removing the batchnorms or by setting WorkingPrecision -> "Real64"
, but speed is going to suffer greatly from the latter. You are getting a different error so I'm not sure what works for you. I think WorkingPrecision -> "Real64"
is going to work, not sure about removing the batchnorms.