Message Boards Message Boards

1
|
452 Views
|
6 Replies
|
7 Total Likes
View groups...
Share
Share this post:

NetTrain fails with MXNetError: Check failed: assign(&dattr, vec.at(i)) on CNN with pooling

I am designing a relatively large CNN using many convolution layers with residual blocks. My attempt to train this network always fails with

NetTrain::interr2: An unknown internal error occurred. Consult Internal`$LastInternalFailure for potential information.

where

MXNetError: Check failed: assign(&dattr, vec.at(i)): Incompatible attr 
in node _copyto at 0-th output: expected [8], got [8,1,7]

However NetInitialize and evaluation always works well.

I am relatively new to neural networks, it could be, that I missed something in the documentation.

There is a correlation, if I use more than 2 residual blocks it start to fail. My example

building blocks (tested)

conv[channelsIn_, channelsOut_, length_: 1024] := 
  ConvolutionLayer[channelsOut, {7}, "Input" -> {channelsIn, length}, PaddingSize -> 3];

batchnorm[channelsIn_, length_: 1024] := 
  BatchNormalizationLayer["Input" -> {channelsIn, length}];

relu[channelsIn_, length_: 1024] := 
  ElementwiseLayer["ReLU", "Input" -> {channelsIn, length}];

residual[channelsIn_, length_: 1024] := 
  NetGraph[
    {
      LinearLayer[{channelsIn, length}], 
      ElementwiseLayer[Ramp, "Input" -> {channelsIn, length}], 
      ThreadingLayer[Plus, "Output" -> {channelsIn, length}, InputPorts -> 2]
    },
    {
      1 -> 2, 
      {NetPort["Input"], 2} -> 3
    },
    "Input" -> {channelsIn, length}
  ];

now assemble NetGraph, which causes the problem

net = NetGraph[
  <|
    "inputBlock" -> NetChain[
      {
        conv[1, 8, 1024], 
        batchnorm[8, 1024], 
        relu[8, 1024], 
        residual[8, 1024],  (* <---- ok *)
        conv[8, 8, 1024], 
        batchnorm[8, 1024], 
        relu[8, 1024]
      }, 
      "Input" -> {1, 1024}, 
      "Output" -> {8, 1024}
    ],
    
    "pooling1" -> PoolingLayer[
      {513}, 
      "Input" -> {8, 1024}, 
      "Output" -> {8, 512}
    ],
    
    "internalBlock11" -> NetChain[
      {
        conv[8, 16, 512], 
        batchnorm[16, 512], 
        relu[16, 512], 
        (* residual[16, 512], *)  (* <---- fails if it is included *)
        conv[16, 16, 512], 
        batchnorm[16, 512], 
        relu[16, 512]
      }, 
      "Input" -> {8, 512}, 
      "Output" -> {16, 512}
    ],
    
    "internalBlock12" -> NetChain[
      {
        conv[16, 16, 512], 
        batchnorm[16, 512], 
        relu[16, 512], 
        (* residual[16, 512], *)  (* <---- fails if it is included *)
        conv[16, 8, 512], 
        batchnorm[8, 512], 
        relu[8, 512]
      }, 
      "Input" -> {16, 512}, 
      "Output" -> {8, 512}
    ],
    
    "unconv1" -> DeconvolutionLayer[
      8, 
      {513}, 
      "Input" -> {8, 512}, 
      PaddingSize -> 0, 
      "Output" -> {8, 1024}
    ],
    
    "concat" -> ThreadingLayer[
      Plus, 
      InputPorts -> 2, 
      "Output" -> {8, 1024}
    ],
    
    "outputBlock" -> NetChain[
      {
        conv[8, 8, 1024], 
        batchnorm[8, 1024], 
        relu[8, 1024], 
        residual[8, 1024],  (* <---- ok *)
        conv[8, 1, 1024], 
        batchnorm[1, 1024], 
        relu[1, 1024]
      }, 
      "Input" -> {8, 1024}, 
      "Output" -> {1, 1024}
    ]
  |>, 
  
  {
    NetPort["Input"] -> "inputBlock", 
    "inputBlock" -> "pooling1", 
    "pooling1" -> "internalBlock11", 
    "internalBlock11" -> "internalBlock12", 
    "internalBlock12" -> "unconv1", 
    {"unconv1", "inputBlock"} -> "concat", 
    "concat" -> "outputBlock", 
    "outputBlock" -> NetPort["Output"]
  }
];

test training

NetTrain[net, {{Table[RandomReal[], {i,1024}]} -> {Table[RandomReal[], {i,1024}]}}]

I could not reproduce this error on a smaller scale networks. Hope anyone has seen something similar.

Thanks

POSTED BY: Kirill Vasin
6 Replies

I have tried to train your internal block and I'm getting a different error (on Linux):

conv[channelsIn_, channelsOut_, length_ : 1024] := 
  ConvolutionLayer[channelsOut, {7}, "Input" -> {channelsIn, length}, 
   PaddingSize -> 3];

batchnorm[channelsIn_, length_ : 1024] := 
  BatchNormalizationLayer["Input" -> {channelsIn, length}];

relu[channelsIn_, length_ : 1024] := 
  ElementwiseLayer["ReLU", "Input" -> {channelsIn, length}];

residual[channelsIn_, length_ : 1024] := 
  NetGraph[{LinearLayer[{channelsIn, length}], 
    ElementwiseLayer[Ramp, "Input" -> {channelsIn, length}], 
    ThreadingLayer[Plus, "Output" -> {channelsIn, length}, 
     InputPorts -> 2]}, {1 -> 2, {NetPort["Input"], 2} -> 3}, 
   "Input" -> {channelsIn, length}];

net = NetFlatten@NetChain[
   {
    conv[8, 16, 512],
    batchnorm[16, 512],
    relu[16, 512],
    residual[16, 512],
    conv[16, 16, 512],
    batchnorm[16, 512],
    relu[16, 512]
    },
   "Input" -> {8, 512},
   "Output" -> {16, 512}
   ]
NetTrain[net, 
 RandomReal[1, {100, 8, 512}] -> RandomReal[1, {100, 16, 512}]]

Error is

MXNetError: Check failed: !is_view:

Regardless of the error, these are MXNet bugs and unfortunately we have to live with them until we migrate the framework to a different backend, whose timeframe is very long (hard to predict, maybe around 1-2 years).

As a workaround, I could train the net normally by removing the batchnorms or by setting WorkingPrecision -> "Real64", but speed is going to suffer greatly from the latter. You are getting a different error so I'm not sure what works for you. I think WorkingPrecision -> "Real64" is going to work, not sure about removing the batchnorms.

No python, we directly hook up the C++ libraries. We already have a working prototype internally, I presented it at this year's Wolfram Tech Conference 2 weeks ago.

Most likely PyTorch, Tensorflow, or both

After poking this problem more, I've found that it might be something related to how I encode Nx1024 to Nx512 blocks. it is always connected with those blocks

    "internalBlock11" -> NetChain[
      {
        conv[8, 16, 512], 
        batchnorm[16, 512], 
        relu[16, 512], 
        residual[16, 512], (* <--- HERE *)
        conv[16, 16, 512], 
        batchnorm[16, 512], 
        relu[16, 512]
      }, 
      "Input" -> {8, 512}, 
      "Output" -> {16, 512}
    ],
POSTED BY: Kirill Vasin

Thank you @Matteo Salvarezza

Your workaround did work well. :) Just a question, is there any plans on which backend is going to be used there?

POSTED BY: Kirill Vasin

Thank you for the reply. Great!

PS: But then probably Torch (c++) or could it be that Python interpreter is going to be integrated into a bundle?

POSTED BY: Kirill Vasin
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract