Message Boards Message Boards

4 Replies
3 Total Likes
View groups...
Share this post:

Why NetPortGradient returns `floating-point overflow` a simple network?

Posted 6 years ago

Consider this simple network that computes the following function:

$$ \left\{ \begin{matrix} xy & \mbox{if} \;x,y>0 \\ 0 & \mbox{otherwise} \end{matrix} \right.$$

The net looks like:

net = NetChain[{ElementwiseLayer[Ramp], AggregationLayer[Times, 1]}]

This works correctly as:

In[2]:= net[{4, 2}]

Out[2]= 8.    

In[3]:= net[{4, -2}]    

Out[3]= 0.    

In[4]:= net[{-4, -2}]    

Out[4]= 0.

However if try to evaluate the gradient in a given point, this returns a floating point error when either $x$ or $y$ are smaller than $0$.

In[5]:= net[{4, -2}, NetPortGradient["Input"]]

During evaluation of In[5]:= NetGraph::netnan: A floating-point overflow, underflow, or division by zero occurred while evaluating the net.

Out[5]= $Failed

Why this happens? Is there a way to avoid this?

The error occurs also if we replace Ramp with

# - # &

but not with

# - # + 1&

PS: this is a very simplified example of the minimal network required to reproduce the error, extracted from a much more complicated loss function

Link to question on Stack Exchange [link]

POSTED BY: Luca Amerio
4 Replies
Posted 6 years ago

I found a possible workaround

Since the error does occur with the network

net = NetChain[{ElementwiseLayer[#-#&], AggregationLayer[Times, 1]}]

but not with the network

net = NetChain[{ElementwiseLayer[#-#+1&], AggregationLayer[Times, 1]}]

I made the hypothesis that the error occurs when you have a region of the function that has both null gradient and null value and they try to compute the product of the variables there.

I therefore reshaped the initial problem as

$$ \left\{ \begin{matrix} (x+1)(y+1) - (x +y) -1 & \mbox{if} \;x,y>0 \\ 0 & \mbox{otherwise} \end{matrix} \right.$$

Therefore the product happens between two variables that have always non-null value (except in a single point $(-1,-1)$)

This can be written in Wolfram's "network language" as

net = NetGraph[
   "ramp" -> ElementwiseLayer[Ramp],
   "x+1" -> ElementwiseLayer[# + 1 &],
   "times" -> AggregationLayer[Times, 1],
   "sum" -> AggregationLayer[Total, 1],
   "x-y-1" -> ThreadingLayer[#1 - #2 - 1 &]
   "ramp" -> "x+1",
   "x+1" -> "times",
   "ramp" -> "sum",
   {"times", "sum"} -> "x-y-1"

and indeed

In[116]:= net[{4, 2}]

Out[116]= 8.

In[117]:= net[{-4, -2}]

Out[117]= 0.

In[118]:= net[{4, -2}, NetPortGradient["Input"]]

Out[118]= {0., 0.}

In[119]:= net[{-4, -2}, NetPortGradient["Input"]]

Out[119]= {0., 0.}

It works!

POSTED BY: Luca Amerio
Posted 6 years ago

Is this a bug? Can you figure out any workaround?

The reason I need this is that I’m trying to repurpose a CNN for object detection (YOLOv2). To do so, I need to evaluate the Intersection Over Union (IOU) of the prediction and the ground truth. Using few tricks with Max and Min, I can evaluate the width and the height of the intersection. I need then to compute its area as width*height, but only if both width and height are positive. If one or both are negative it means that there is no overlap and the IOU should be 0.

I managed to implement a network for the evaluation of the IOU correctly, but cannot train the final net because NetTrain fails to compute the gradient.

POSTED BY: Luca Amerio

Presumably from division by zero in an attempt to form a numeric approximation.

POSTED BY: Daniel Lichtblau

Thank you for bringing this to our attention. I've notified the developers of this issue.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract