Group Abstract

Message Boards

5.9K Views

4 Replies

4 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Mathematica Machine Learning

Posted 3 years ago

Hello, I am trying to train a convolutional neutral network on a segmentation task, however i keep having divergence of parameters in one of the streams in the network. I use Adam optimizer. When I use Flux framework on the same talk with AdaMax it doesn't diverge. In general AdaMax is quite stable optimizer. Is it possible to use it with NetTrain somehow? For example is it possible to use a custom weight update algorithm? Currently I use something like this: NetTrain[net, {trainGen, "RoundLength"->Length@trinsIds}, Method -> "ADAM", LearningRate->0.0001] Thank you.

POSTED BY: Teodor Boyadzhiev

4 Replies

Sort By:

Posted 3 years ago

I confirm that there is currently no way to plug a custom optimizer in NetTrain. We will check how to support more optimizers, like Adamax. However your "divergence" problem is not due to the optimization algorithm, as you discovered in the meantime. It's due to a layer of the network producing a NaN on some input. So probably your implementation of Intersection Over Union is not numerically safe. For example, what can happen is that the Union at the denominator is zero, so you should clip it, with something similar to: In[1]:= ThreadingLayer[#1/#2&][{0,0}] During evaluation of In[1]:= ThreadingLayer::netnan: A floating-point overflow, underflow, or division by zero occurred while evaluating the net. Out[1]= $Failed In[2]:= ThreadingLayer[#1/Max[1*^-15,#2]&][{0,0}] Out[2]= 0. In general, NetTrain report divergence when a numerical problem like this occurs (it also can happen with Log or Sqrt on negative numbers). It's true that the error reporting has to be improved : at least the message has to be changed, and the training should continue if possible. Concerning Intersection Over Union, note that if you use a MeanSquaredLossLayer or a MeanAbsoluteLossLayer on the 4 coordinates, you have it available as a builtin measure (see "Details" section of https://reference.wolfram.com/language/ref/TrainingProgressMeasurements.html, "For nets that contain a MeanSquaredLossLayer or MeanAbsoluteLossLayer,...").

POSTED BY: Jérôme Louradour

Posted 3 years ago

Thank you very much for the answer! I removed the monitoring of Intersection Over Union IOU for my network. Before I had something like this `TrainingProgressMeasurements -> NetPort["SemanticIOU"]` and I just removed it. Since then all of the experiments are very stable and there are no error related to "Divergence". However I would like to still add my IOU to the TrainingProgressMeasurements . The thing is that the default IOU is defined for bounding box regression. I am using semantic masks, so I had to implement is myself. However, I didn't take into account that I may have numerical issues due to division by zero. I will add the "fix" which you suggested and let you know if it worked.

POSTED BY: Teodor Boyadzhiev

Posted 3 years ago

I think there is a "hack" for controlling the LearningRateMultiplier through TrainingProgressFunction. However, I am unsure how fast it will be and whether you can multiply elementwise by an array or just scalar is supported. https://mathematica.stackexchange.com/questions/203515/custom-sgd-optimizer-in-mathematica-neural-network-framework The other option is to use some port to MXNet. To me, it is a bit frustrating, because they use MXNet as a backend, which actually supports AdaMax. I will try to look up some options for low lever MXNet api for Mathematica and will update. Edit: When I was training the network I was calculating Intersection Over Union, but didn't use it as an error. Just measurement during training and testing. When I removed tue nodes from the network which calculate it NetTrain stopped complaining. The problem was that before that when I had the error I was also calculating the magnitute of the gradients. It didn't increase during "divergence". Also the last trained network worked as expected, doing fair job at segmentation, consistent with the measurements. Later when I continued training it, the training process was fine, the error continued decreasing for the same amout of iterations before throwing the error again. The result was still usable. I believe that it doesn't converge at all. I dont believe at all that it is really diverging. Should I report a bug? Alternatively, how can I switch off the monitoring for divergence?

POSTED BY: Teodor Boyadzhiev

Posted 3 years ago

Obviously it would be nice to have a developer confirm it but I'm pretty sure that we can't customize the optimizer. You're stuck with Adam, RMSProp, SGD or SignSGD.

POSTED BY: Dan Farmer

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback