Message Boards Message Boards

1
|
6646 Views
|
4 Replies
|
3 Total Likes
View groups...
Share
Share this post:

Why NetPortGradient returns `floating-point overflow` a simple network?

Posted 6 years ago

Consider this simple network that computes the following function:

$$ \left\{ \begin{matrix} xy & \mbox{if} \;x,y>0 \\ 0 & \mbox{otherwise} \end{matrix} \right.$$

The net looks like:

net = NetChain[{ElementwiseLayer[Ramp], AggregationLayer[Times, 1]}]

This works correctly as:

In[2]:= net[{4, 2}]

Out[2]= 8.    

In[3]:= net[{4, -2}]    

Out[3]= 0.    

In[4]:= net[{-4, -2}]    

Out[4]= 0.

However if try to evaluate the gradient in a given point, this returns a floating point error when either $x$ or $y$ are smaller than $0$.

In[5]:= net[{4, -2}, NetPortGradient["Input"]]

During evaluation of In[5]:= NetGraph::netnan: A floating-point overflow, underflow, or division by zero occurred while evaluating the net.

Out[5]= $Failed

Why this happens? Is there a way to avoid this?

The error occurs also if we replace Ramp with

# - # &

but not with

# - # + 1&

PS: this is a very simplified example of the minimal network required to reproduce the error, extracted from a much more complicated loss function

Link to question on Stack Exchange [link]

POSTED BY: Luca Amerio
4 Replies

Thank you for bringing this to our attention. I've notified the developers of this issue.

Posted 6 years ago

I found a possible workaround

Since the error does occur with the network

net = NetChain[{ElementwiseLayer[#-#&], AggregationLayer[Times, 1]}]

but not with the network

net = NetChain[{ElementwiseLayer[#-#+1&], AggregationLayer[Times, 1]}]

I made the hypothesis that the error occurs when you have a region of the function that has both null gradient and null value and they try to compute the product of the variables there.

I therefore reshaped the initial problem as

$$ \left\{ \begin{matrix} (x+1)(y+1) - (x +y) -1 & \mbox{if} \;x,y>0 \\ 0 & \mbox{otherwise} \end{matrix} \right.$$

Therefore the product happens between two variables that have always non-null value (except in a single point $(-1,-1)$)

This can be written in Wolfram's "network language" as

net = NetGraph[
  {
   "ramp" -> ElementwiseLayer[Ramp],
   "x+1" -> ElementwiseLayer[# + 1 &],
   "times" -> AggregationLayer[Times, 1],
   "sum" -> AggregationLayer[Total, 1],
   "x-y-1" -> ThreadingLayer[#1 - #2 - 1 &]
   },
  {
   "ramp" -> "x+1",
   "x+1" -> "times",
   "ramp" -> "sum",
   {"times", "sum"} -> "x-y-1"
   }
  ]

and indeed

In[116]:= net[{4, 2}]

Out[116]= 8.

In[117]:= net[{-4, -2}]

Out[117]= 0.

In[118]:= net[{4, -2}, NetPortGradient["Input"]]

Out[118]= {0., 0.}

In[119]:= net[{-4, -2}, NetPortGradient["Input"]]

Out[119]= {0., 0.}

It works!

POSTED BY: Luca Amerio

Presumably from division by zero in an attempt to form a numeric approximation.

POSTED BY: Daniel Lichtblau
Posted 6 years ago

Is this a bug? Can you figure out any workaround?

The reason I need this is that I’m trying to repurpose a CNN for object detection (YOLOv2). To do so, I need to evaluate the Intersection Over Union (IOU) of the prediction and the ground truth. Using few tricks with Max and Min, I can evaluate the width and the height of the intersection. I need then to compute its area as width*height, but only if both width and height are positive. If one or both are negative it means that there is no overlap and the IOU should be 0.

I managed to implement a network for the evaluation of the IOU correctly, but cannot train the final net because NetTrain fails to compute the gradient.

POSTED BY: Luca Amerio
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract