I confirm that there is currently no way to plug a custom optimizer in NetTrain. We will check how to support more optimizers, like Adamax.
However your "divergence" problem is not due to the optimization algorithm, as you discovered in the meantime. It's due to a layer of the network producing a NaN on some input. So probably your implementation of Intersection Over Union is not numerically safe. For example, what can happen is that the Union at the denominator is zero, so you should clip it, with something similar to:
In[1]:= ThreadingLayer[#1/#2&][{0,0}]
During evaluation of In[1]:= ThreadingLayer::netnan: A floating-point overflow, underflow, or division by zero occurred while evaluating the net.
Out[1]= $Failed
In[2]:= ThreadingLayer[#1/Max[1*^-15,#2]&][{0,0}]
Out[2]= 0.
In general, NetTrain report divergence when a numerical problem like this occurs (it also can happen with Log or Sqrt on negative numbers). It's true that the error reporting has to be improved : at least the message has to be changed, and the training should continue if possible.
Concerning Intersection Over Union, note that if you use a MeanSquaredLossLayer or a MeanAbsoluteLossLayer on the 4 coordinates, you have it available as a builtin measure (see "Details" section of https://reference.wolfram.com/language/ref/TrainingProgressMeasurements.html, "For nets that contain a MeanSquaredLossLayer or MeanAbsoluteLossLayer,...").