Group Abstract Group Abstract

Message Boards Message Boards

1
|
5.5K Views
|
4 Replies
|
4 Total Likes
View groups...
Share
Share this post:

NetTrain with AdaMax

Posted 3 years ago

Hello,

I am trying to train a convolutional neutral network on a segmentation task, however i keep having divergence of parameters in one of the streams in the network. I use Adam optimizer. When I use Flux framework on the same talk with AdaMax it doesn't diverge. In general AdaMax is quite stable optimizer. Is it possible to use it with NetTrain somehow? For example is it possible to use a custom weight update algorithm?

Currently I use something like this:

NetTrain[net, {trainGen, "RoundLength"->Length@trinsIds},
  Method -> "ADAM",
  LearningRate->0.0001]

Thank you.

4 Replies

I confirm that there is currently no way to plug a custom optimizer in NetTrain. We will check how to support more optimizers, like Adamax.

However your "divergence" problem is not due to the optimization algorithm, as you discovered in the meantime. It's due to a layer of the network producing a NaN on some input. So probably your implementation of Intersection Over Union is not numerically safe. For example, what can happen is that the Union at the denominator is zero, so you should clip it, with something similar to:

In[1]:= ThreadingLayer[#1/#2&][{0,0}]
During evaluation of In[1]:= ThreadingLayer::netnan: A floating-point overflow, underflow, or division by zero occurred while evaluating the net.
Out[1]= $Failed
In[2]:= ThreadingLayer[#1/Max[1*^-15,#2]&][{0,0}]
Out[2]= 0.

In general, NetTrain report divergence when a numerical problem like this occurs (it also can happen with Log or Sqrt on negative numbers). It's true that the error reporting has to be improved : at least the message has to be changed, and the training should continue if possible.

Concerning Intersection Over Union, note that if you use a MeanSquaredLossLayer or a MeanAbsoluteLossLayer on the 4 coordinates, you have it available as a builtin measure (see "Details" section of https://reference.wolfram.com/language/ref/TrainingProgressMeasurements.html, "For nets that contain a MeanSquaredLossLayer or MeanAbsoluteLossLayer,...").

Thank you very much for the answer! I removed the monitoring of Intersection Over Union IOU for my network. Before I had something like this TrainingProgressMeasurements -> NetPort["SemanticIOU"] and I just removed it. Since then all of the experiments are very stable and there are no error related to "Divergence".

However I would like to still add my IOU to the TrainingProgressMeasurements . The thing is that the default IOU is defined for bounding box regression. I am using semantic masks, so I had to implement is myself. However, I didn't take into account that I may have numerical issues due to division by zero. I will add the "fix" which you suggested and let you know if it worked.

Posted 3 years ago

Obviously it would be nice to have a developer confirm it but I'm pretty sure that we can't customize the optimizer. You're stuck with Adam, RMSProp, SGD or SignSGD.

POSTED BY: Dan Farmer
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard