Group Abstract

Message Boards

WOLFRAM COMMUNITY

2K Views

6 Replies

7 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Data Science Wolfram Language Neural Networks

NetTrain fails with MXNetError: Check failed: assign(&dattr, vec.at(i)) on CNN with pooling

Kirill Vasin

Kirill Vasin, Augsburg University

Posted 9 months ago

I am designing a relatively large CNN using many convolution layers with residual blocks. My attempt to train this network always fails with NetTrain::interr2: An unknown internal error occurred. Consult Internal`$LastInternalFailure for potential information. where MXNetError: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in node _copyto at 0-th output: expected [8], got [8,1,7] However `NetInitialize` and evaluation always works well. I am relatively new to neural networks, it could be, that I missed something in the documentation. There is a correlation, if I use more than 2 residual blocks it start to fail. My example building blocks (tested) conv[channelsIn_, channelsOut_, length_: 1024] := ConvolutionLayer[channelsOut, {7}, "Input" -> {channelsIn, length}, PaddingSize -> 3]; batchnorm[channelsIn_, length_: 1024] := BatchNormalizationLayer["Input" -> {channelsIn, length}]; relu[channelsIn_, length_: 1024] := ElementwiseLayer["ReLU", "Input" -> {channelsIn, length}]; residual[channelsIn_, length_: 1024] := NetGraph[ { LinearLayer[{channelsIn, length}], ElementwiseLayer[Ramp, "Input" -> {channelsIn, length}], ThreadingLayer[Plus, "Output" -> {channelsIn, length}, InputPorts -> 2] }, { 1 -> 2, {NetPort["Input"], 2} -> 3 }, "Input" -> {channelsIn, length} ]; now assemble `NetGraph`, which causes the problem net = NetGraph[ <\| "inputBlock" -> NetChain[ { conv[1, 8, 1024], batchnorm[8, 1024], relu[8, 1024], residual[8, 1024], (* <---- ok ) conv[8, 8, 1024], batchnorm[8, 1024], relu[8, 1024] }, "Input" -> {1, 1024}, "Output" -> {8, 1024} ], "pooling1" -> PoolingLayer[ {513}, "Input" -> {8, 1024}, "Output" -> {8, 512} ], "internalBlock11" -> NetChain[ { conv[8, 16, 512], batchnorm[16, 512], relu[16, 512], ( residual[16, 512], ) ( <---- fails if it is included ) conv[16, 16, 512], batchnorm[16, 512], relu[16, 512] }, "Input" -> {8, 512}, "Output" -> {16, 512} ], "internalBlock12" -> NetChain[ { conv[16, 16, 512], batchnorm[16, 512], relu[16, 512], ( residual[16, 512], ) ( <---- fails if it is included ) conv[16, 8, 512], batchnorm[8, 512], relu[8, 512] }, "Input" -> {16, 512}, "Output" -> {8, 512} ], "unconv1" -> DeconvolutionLayer[ 8, {513}, "Input" -> {8, 512}, PaddingSize -> 0, "Output" -> {8, 1024} ], "concat" -> ThreadingLayer[ Plus, InputPorts -> 2, "Output" -> {8, 1024} ], "outputBlock" -> NetChain[ { conv[8, 8, 1024], batchnorm[8, 1024], relu[8, 1024], residual[8, 1024], ( <---- ok *) conv[8, 1, 1024], batchnorm[1, 1024], relu[1, 1024] }, "Input" -> {8, 1024}, "Output" -> {1, 1024} ] \|>, { NetPort["Input"] -> "inputBlock", "inputBlock" -> "pooling1", "pooling1" -> "internalBlock11", "internalBlock11" -> "internalBlock12", "internalBlock12" -> "unconv1", {"unconv1", "inputBlock"} -> "concat", "concat" -> "outputBlock", "outputBlock" -> NetPort["Output"] } ]; test training NetTrain[net, {{Table[RandomReal[], {i,1024}]} -> {Table[RandomReal[], {i,1024}]}}] I could not reproduce this error on a smaller scale networks. Hope anyone has seen something similar. Thanks

I am designing a relatively large CNN using many convolution layers with residual blocks. My attempt to train this network always fails with

NetTrain::interr2: An unknown internal error occurred. Consult Internal`$LastInternalFailure for potential information.

where

MXNetError: Check failed: assign(&dattr, vec.at(i)): Incompatible attr 
in node _copyto at 0-th output: expected [8], got [8,1,7]

However NetInitialize and evaluation always works well.

I am relatively new to neural networks, it could be, that I missed something in the documentation.

There is a correlation, if I use more than 2 residual blocks it start to fail. My example

building blocks (tested)

conv[channelsIn_, channelsOut_, length_: 1024] := 
  ConvolutionLayer[channelsOut, {7}, "Input" -> {channelsIn, length}, PaddingSize -> 3];

batchnorm[channelsIn_, length_: 1024] := 
  BatchNormalizationLayer["Input" -> {channelsIn, length}];

relu[channelsIn_, length_: 1024] := 
  ElementwiseLayer["ReLU", "Input" -> {channelsIn, length}];

residual[channelsIn_, length_: 1024] := 
  NetGraph[
    {
      LinearLayer[{channelsIn, length}], 
      ElementwiseLayer[Ramp, "Input" -> {channelsIn, length}], 
      ThreadingLayer[Plus, "Output" -> {channelsIn, length}, InputPorts -> 2]
    },
    {
      1 -> 2, 
      {NetPort["Input"], 2} -> 3
    },
    "Input" -> {channelsIn, length}
  ];

now assemble NetGraph, which causes the problem

net = NetGraph[
  <|
    "inputBlock" -> NetChain[
      {
        conv[1, 8, 1024], 
        batchnorm[8, 1024], 
        relu[8, 1024], 
        residual[8, 1024],  (* <---- ok *)
        conv[8, 8, 1024], 
        batchnorm[8, 1024], 
        relu[8, 1024]
      }, 
      "Input" -> {1, 1024}, 
      "Output" -> {8, 1024}
    ],
    
    "pooling1" -> PoolingLayer[
      {513}, 
      "Input" -> {8, 1024}, 
      "Output" -> {8, 512}
    ],
    
    "internalBlock11" -> NetChain[
      {
        conv[8, 16, 512], 
        batchnorm[16, 512], 
        relu[16, 512], 
        (* residual[16, 512], *)  (* <---- fails if it is included *)
        conv[16, 16, 512], 
        batchnorm[16, 512], 
        relu[16, 512]
      }, 
      "Input" -> {8, 512}, 
      "Output" -> {16, 512}
    ],
    
    "internalBlock12" -> NetChain[
      {
        conv[16, 16, 512], 
        batchnorm[16, 512], 
        relu[16, 512], 
        (* residual[16, 512], *)  (* <---- fails if it is included *)
        conv[16, 8, 512], 
        batchnorm[8, 512], 
        relu[8, 512]
      }, 
      "Input" -> {16, 512}, 
      "Output" -> {8, 512}
    ],
    
    "unconv1" -> DeconvolutionLayer[
      8, 
      {513}, 
      "Input" -> {8, 512}, 
      PaddingSize -> 0, 
      "Output" -> {8, 1024}
    ],
    
    "concat" -> ThreadingLayer[
      Plus, 
      InputPorts -> 2, 
      "Output" -> {8, 1024}
    ],
    
    "outputBlock" -> NetChain[
      {
        conv[8, 8, 1024], 
        batchnorm[8, 1024], 
        relu[8, 1024], 
        residual[8, 1024],  (* <---- ok *)
        conv[8, 1, 1024], 
        batchnorm[1, 1024], 
        relu[1, 1024]
      }, 
      "Input" -> {8, 1024}, 
      "Output" -> {1, 1024}
    ]
  |>, 
  
  {
    NetPort["Input"] -> "inputBlock", 
    "inputBlock" -> "pooling1", 
    "pooling1" -> "internalBlock11", 
    "internalBlock11" -> "internalBlock12", 
    "internalBlock12" -> "unconv1", 
    {"unconv1", "inputBlock"} -> "concat", 
    "concat" -> "outputBlock", 
    "outputBlock" -> NetPort["Output"]
  }
];

test training

NetTrain[net, {{Table[RandomReal[], {i,1024}]} -> {Table[RandomReal[], {i,1024}]}}]

I could not reproduce this error on a smaller scale networks. Hope anyone has seen something similar.

Thanks

POSTED BY: Kirill Vasin

6 Replies

Sort By:

Matteo Salvarezza

Matteo Salvarezza, Wolfram Research

Posted 9 months ago

I have tried to train your internal block and I'm getting a different error (on Linux): conv[channelsIn_, channelsOut_, length_ : 1024] := ConvolutionLayer[channelsOut, {7}, "Input" -> {channelsIn, length}, PaddingSize -> 3]; batchnorm[channelsIn_, length_ : 1024] := BatchNormalizationLayer["Input" -> {channelsIn, length}]; relu[channelsIn_, length_ : 1024] := ElementwiseLayer["ReLU", "Input" -> {channelsIn, length}]; residual[channelsIn_, length_ : 1024] := NetGraph[{LinearLayer[{channelsIn, length}], ElementwiseLayer[Ramp, "Input" -> {channelsIn, length}], ThreadingLayer[Plus, "Output" -> {channelsIn, length}, InputPorts -> 2]}, {1 -> 2, {NetPort["Input"], 2} -> 3}, "Input" -> {channelsIn, length}]; net = NetFlatten@NetChain[ { conv[8, 16, 512], batchnorm[16, 512], relu[16, 512], residual[16, 512], conv[16, 16, 512], batchnorm[16, 512], relu[16, 512] }, "Input" -> {8, 512}, "Output" -> {16, 512} ] NetTrain[net, RandomReal[1, {100, 8, 512}] -> RandomReal[1, {100, 16, 512}]] Error is MXNetError: Check failed: !is_view: Regardless of the error, these are MXNet bugs and unfortunately we have to live with them until we migrate the framework to a different backend, whose timeframe is very long (hard to predict, maybe around 1-2 years). As a workaround, I could train the net normally by removing the batchnorms or by setting `WorkingPrecision -> "Real64"`, but speed is going to suffer greatly from the latter. You are getting a different error so I'm not sure what works for you. I think `WorkingPrecision -> "Real64"` is going to work, not sure about removing the batchnorms.

I have tried to train your internal block and I'm getting a different error (on Linux):

conv[channelsIn_, channelsOut_, length_ : 1024] := 
  ConvolutionLayer[channelsOut, {7}, "Input" -> {channelsIn, length}, 
   PaddingSize -> 3];

batchnorm[channelsIn_, length_ : 1024] := 
  BatchNormalizationLayer["Input" -> {channelsIn, length}];

relu[channelsIn_, length_ : 1024] := 
  ElementwiseLayer["ReLU", "Input" -> {channelsIn, length}];

residual[channelsIn_, length_ : 1024] := 
  NetGraph[{LinearLayer[{channelsIn, length}], 
    ElementwiseLayer[Ramp, "Input" -> {channelsIn, length}], 
    ThreadingLayer[Plus, "Output" -> {channelsIn, length}, 
     InputPorts -> 2]}, {1 -> 2, {NetPort["Input"], 2} -> 3}, 
   "Input" -> {channelsIn, length}];

net = NetFlatten@NetChain[
   {
    conv[8, 16, 512],
    batchnorm[16, 512],
    relu[16, 512],
    residual[16, 512],
    conv[16, 16, 512],
    batchnorm[16, 512],
    relu[16, 512]
    },
   "Input" -> {8, 512},
   "Output" -> {16, 512}
   ]
NetTrain[net, 
 RandomReal[1, {100, 8, 512}] -> RandomReal[1, {100, 16, 512}]]

Error is

MXNetError: Check failed: !is_view:

Regardless of the error, these are MXNet bugs and unfortunately we have to live with them until we migrate the framework to a different backend, whose timeframe is very long (hard to predict, maybe around 1-2 years).

As a workaround, I could train the net normally by removing the batchnorms or by setting WorkingPrecision -> "Real64", but speed is going to suffer greatly from the latter. You are getting a different error so I'm not sure what works for you. I think WorkingPrecision -> "Real64" is going to work, not sure about removing the batchnorms.

POSTED BY: Matteo Salvarezza

Kirill Vasin

Kirill Vasin, Augsburg University

Posted 9 months ago

Thank you @Matteo Salvarezza Your workaround did work well. :) Just a question, is there any plans on which backend is going to be used there?

POSTED BY: Kirill Vasin

Matteo Salvarezza

Matteo Salvarezza, Wolfram Research

Posted 9 months ago

Most likely PyTorch, Tensorflow, or both

POSTED BY: Matteo Salvarezza

Kirill Vasin

Kirill Vasin, Augsburg University

Posted 9 months ago

Thank you for the reply. Great! PS: But then probably Torch (c++) or could it be that Python interpreter is going to be integrated into a bundle?

POSTED BY: Kirill Vasin

Matteo Salvarezza

Matteo Salvarezza, Wolfram Research

Posted 9 months ago

No python, we directly hook up the C++ libraries. We already have a working prototype internally, I presented it at this year's Wolfram Tech Conference 2 weeks ago.

POSTED BY: Matteo Salvarezza

Kirill Vasin

Kirill Vasin, Augsburg University

Posted 9 months ago

After poking this problem more, I've found that it might be something related to how I encode Nx1024 to Nx512 blocks. it is always connected with those blocks "internalBlock11" -> NetChain[ { conv[8, 16, 512], batchnorm[16, 512], relu[16, 512], residual[16, 512], (* <--- HERE *) conv[16, 16, 512], batchnorm[16, 512], relu[16, 512] }, "Input" -> {8, 512}, "Output" -> {16, 512} ],

After poking this problem more, I've found that it might be something related to how I encode Nx1024 to Nx512 blocks. it is always connected with those blocks

    "internalBlock11" -> NetChain[
      {
        conv[8, 16, 512], 
        batchnorm[16, 512], 
        relu[16, 512], 
        residual[16, 512], (* <--- HERE *)
        conv[16, 16, 512], 
        batchnorm[16, 512], 
        relu[16, 512]
      }, 
      "Input" -> {8, 512}, 
      "Output" -> {16, 512}
    ],

POSTED BY: Kirill Vasin

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback