Group Abstract Group Abstract

Message Boards Message Boards

1
|
5.5K Views
|
4 Replies
|
4 Total Likes
View groups...
Share
Share this post:

NetTrain with AdaMax

Posted 3 years ago

Hello,

I am trying to train a convolutional neutral network on a segmentation task, however i keep having divergence of parameters in one of the streams in the network. I use Adam optimizer. When I use Flux framework on the same talk with AdaMax it doesn't diverge. In general AdaMax is quite stable optimizer. Is it possible to use it with NetTrain somehow? For example is it possible to use a custom weight update algorithm?

Currently I use something like this:

NetTrain[net, {trainGen, "RoundLength"->Length@trinsIds},
  Method -> "ADAM",
  LearningRate->0.0001]

Thank you.

4 Replies

I confirm that there is currently no way to plug a custom optimizer in NetTrain. We will check how to support more optimizers, like Adamax.

However your "divergence" problem is not due to the optimization algorithm, as you discovered in the meantime. It's due to a layer of the network producing a NaN on some input. So probably your implementation of Intersection Over Union is not numerically safe. For example, what can happen is that the Union at the denominator is zero, so you should clip it, with something similar to:

In[1]:= ThreadingLayer[#1/#2&][{0,0}]
During evaluation of In[1]:= ThreadingLayer::netnan: A floating-point overflow, underflow, or division by zero occurred while evaluating the net.
Out[1]= $Failed
In[2]:= ThreadingLayer[#1/Max[1*^-15,#2]&][{0,0}]
Out[2]= 0.

In general, NetTrain report divergence when a numerical problem like this occurs (it also can happen with Log or Sqrt on negative numbers). It's true that the error reporting has to be improved : at least the message has to be changed, and the training should continue if possible.

Concerning Intersection Over Union, note that if you use a MeanSquaredLossLayer or a MeanAbsoluteLossLayer on the 4 coordinates, you have it available as a builtin measure (see "Details" section of https://reference.wolfram.com/language/ref/TrainingProgressMeasurements.html, "For nets that contain a MeanSquaredLossLayer or MeanAbsoluteLossLayer,...").

Thank you very much for the answer! I removed the monitoring of Intersection Over Union IOU for my network. Before I had something like this TrainingProgressMeasurements -> NetPort["SemanticIOU"] and I just removed it. Since then all of the experiments are very stable and there are no error related to "Divergence".

However I would like to still add my IOU to the TrainingProgressMeasurements . The thing is that the default IOU is defined for bounding box regression. I am using semantic masks, so I had to implement is myself. However, I didn't take into account that I may have numerical issues due to division by zero. I will add the "fix" which you suggested and let you know if it worked.

I think there is a "hack" for controlling the LearningRateMultiplier through TrainingProgressFunction. However, I am unsure how fast it will be and whether you can multiply elementwise by an array or just scalar is supported. https://mathematica.stackexchange.com/questions/203515/custom-sgd-optimizer-in-mathematica-neural-network-framework

The other option is to use some port to MXNet. To me, it is a bit frustrating, because they use MXNet as a backend, which actually supports AdaMax.

I will try to look up some options for low lever MXNet api for Mathematica and will update.

Edit: When I was training the network I was calculating Intersection Over Union, but didn't use it as an error. Just measurement during training and testing. When I removed tue nodes from the network which calculate it NetTrain stopped complaining. The problem was that before that when I had the error I was also calculating the magnitute of the gradients. It didn't increase during "divergence". Also the last trained network worked as expected, doing fair job at segmentation, consistent with the measurements. Later when I continued training it, the training process was fine, the error continued decreasing for the same amout of iterations before throwing the error again.

The result was still usable. I believe that it doesn't converge at all. I dont believe at all that it is really diverging. Should I report a bug? Alternatively, how can I switch off the monitoring for divergence?

Posted 3 years ago

Obviously it would be nice to have a developer confirm it but I'm pretty sure that we can't customize the optimizer. You're stuck with Adam, RMSProp, SGD or SignSGD.

POSTED BY: Dan Farmer
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard