Group Abstract

Message Boards

7.2K Views

4 Replies

3 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Data Science Wolfram Language Machine Learning Neural Networks

Posted 6 years ago

Consider this simple network that computes the following function: $$ \left\{ \begin{matrix} xy & \mbox{if} \;x,y>0 \\ 0 & \mbox{otherwise} \end{matrix} \right.$$ The net looks like: net = NetChain[{ElementwiseLayer[Ramp], AggregationLayer[Times, 1]}] This works correctly as: In[2]:= net[{4, 2}] Out[2]= 8. In[3]:= net[{4, -2}] Out[3]= 0. In[4]:= net[{-4, -2}] Out[4]= 0. However if try to evaluate the gradient in a given point, this returns a floating point error when either $x$ or $y$ are smaller than $0$. In[5]:= net[{4, -2}, NetPortGradient["Input"]] During evaluation of In[5]:= NetGraph::netnan: A floating-point overflow, underflow, or division by zero occurred while evaluating the net. Out[5]= $Failed Why this happens? Is there a way to avoid this? The error occurs also if we replace `Ramp` with # - # & but not with # - # + 1& PS: this is a very simplified example of the minimal network required to reproduce the error, extracted from a much more complicated loss function Link to question on Stack Exchange [link]

POSTED BY: Luca Amerio

4 Replies

Sort By:

Posted 6 years ago

I found a possible workaround Since the error does occur with the network net = NetChain[{ElementwiseLayer[#-#&], AggregationLayer[Times, 1]}] but not with the network net = NetChain[{ElementwiseLayer[#-#+1&], AggregationLayer[Times, 1]}] I made the hypothesis that the error occurs when you have a region of the function that has both null gradient and null value and they try to compute the product of the variables there. I therefore reshaped the initial problem as $$ \left\{ \begin{matrix} (x+1)(y+1) - (x +y) -1 & \mbox{if} \;x,y>0 \\ 0 & \mbox{otherwise} \end{matrix} \right.$$ Therefore the product happens between two variables that have always non-null value (except in a single point $(-1,-1)$) This can be written in Wolfram's "network language" as net = NetGraph[ { "ramp" -> ElementwiseLayer[Ramp], "x+1" -> ElementwiseLayer[# + 1 &], "times" -> AggregationLayer[Times, 1], "sum" -> AggregationLayer[Total, 1], "x-y-1" -> ThreadingLayer[#1 - #2 - 1 &] }, { "ramp" -> "x+1", "x+1" -> "times", "ramp" -> "sum", {"times", "sum"} -> "x-y-1" } ] and indeed In[116]:= net[{4, 2}] Out[116]= 8. In[117]:= net[{-4, -2}] Out[117]= 0. In[118]:= net[{4, -2}, NetPortGradient["Input"]] Out[118]= {0., 0.} In[119]:= net[{-4, -2}, NetPortGradient["Input"]] Out[119]= {0., 0.} It works!

POSTED BY: Luca Amerio

Posted 6 years ago

Is this a bug? Can you figure out any workaround? The reason I need this is that Im trying to repurpose a CNN for object detection (YOLOv2). To do so, I need to evaluate the Intersection Over Union (IOU) of the prediction and the ground truth. Using few tricks with Max and Min, I can evaluate the width and the height of the intersection. I need then to compute its area as width*height, but only if both width and height are positive. If one or both are negative it means that there is no overlap and the IOU should be 0. I managed to implement a network for the evaluation of the IOU correctly, but cannot train the final net because NetTrain fails to compute the gradient.