Message Boards Message Boards

3
|
10062 Views
|
3 Replies
|
5 Total Likes
View groups...
Share
Share this post:

Create a custom loss function with NetTrain?

Suppose I want to Classify some data but, for my own reasons, want a custom NeuralNet architecture rather than whatever Classify develops algorithmically. AND I also want a custom loss function. In my example, I want an asymmetric loss such that predicting True when the real answer is False is a worse problem than predicting False when the real answer is True. In Classify, there is an option UtilityFunction that works splendidly in such cases. And I think the following set of layers would work to emulate a utility function in the Neural Network arena if I wanted losses in one direction to count double losses in the other direction. There may well be much better functions, I only show the code below to indicate that something may be possible.

 lossnet = 
  NetChain[{ThreadingLayer[#1 - #2 &], 
    ElementwiseLayer[2*Ramp[#] + 1*Ramp[-#] &]}]

But I can't quite figure out how to put it all together. The particular toy problem I want to solve is to come up with a model that works on the Titanic dataset and predicts survival but, for my own reasons, counts a prediction of survival when the person dies as worse that prediction of death when the person survives.

Three other notes:

1) My question is related to a question asked here but no one ever answered it.

2) The documentation for the neural net framework really needs to be improved, particularly if it escapes the "Experimental" framework. Right now, it is missing the conceptual framework that would make its use easy. It also seems to have a very heavy focus on image processing rather than on data analysis in other contexts, such as social science. Moreover, some of the documentation is underinclusive. By way of example, there are options to NetGraph that are listed in the "Details" section yet there is no indication at the top of the ref page that any options exist. As a result it is extremely challenging to figure out how to deal with data such as the Titanic which is a list of Associations and for which various columns of the data need special encoding.

3) One motivation for using a custom utility function is that when one output class is scarce, the neural net frequently develops a predictor that always predicts the most common class: predicting that everyone on the Titanic will live. In the Classify context, there are ways of dealing with this: use of ClassPriors, UtilityFunctions. I'd like the same capabilities when using the Neural Network framework.

POSTED BY: Seth Chandler
3 Replies

I'm still very much hoping for a response here. BUT ... I did make a lot of progress on the general issues involved in using NeuralNetworks on data with nominal values. I've shared my progress on a post here.

POSTED BY: Seth Chandler

Using a custom loss is documented in the 3rd example of the LossFunction documentation:

http://reference.wolfram.com/language/ref/LossFunction.html.

That whole page is useful for understanding how to control losses.

  1. Yes, there is an example and the documentation does provide some assistance. But it's not complete enough. Look at NetTrain. It shows four potential forms of data. The example in the documentation for LossFunction covers Case 2. There's no explicitly named "Target." Rather, the kernel figures out what data is the Target. But what about cases 3 and 4, for example, where there is nothing actually named "Target." How do we hook up the network then?

  2. I find the documentation confusing on the specification of the loss function. Here's one example from the documentation.

    loss = CrossEntropyLossLayer["Probabilities"];

    trained = NetTrain[net, {{1, 2} -> {1., 0.}, {2, 3} -> {1., 0.}, {4, 2} -> {0., 1.}, {3, 1} -> {0., 1.}, {2, 2} -> {0.5, 0.5}, {3, 3} -> {0.5, 0.5}}, LossFunction -> loss]

So, here we train on loss. BUT .. look at the next example. You've added an additional port "Loss" (different than lowercase "loss"). Why do we need that? Why can't we train on "loss" as before?

 lossNet = 
 NetGraph[<|"net" -> net, 
 "loss" -> 
 ThreadingLayer[(#1 - #2)^2 &]|>, {{"net", NetPort["Target"]} -> 
 "loss" -> NetPort["Loss"]}]

The next line of code in the documentation further complicates matters by not telling the user what exactly is being trained on. Are you training on "loss" on "Loss" or something else. And, so far as I can figure out there is no function in the System` context of 11.3 to figure out what function NetTrain is or has used to do its training.

 data = Flatten@
 Table[{x, y} -> Exp[-(x^2 + y^2)], {x, -2, 2, .01}, {y, -2, 
 2, .01}];
 trainedLossNet = NetTrain[lossNet, data]

Perhaps some assistance can be found in the tutorial here in which you make clear that you are training on that final output (in the above example "Loss" and in that example "WeightedLoss" but there's no clear explanation of why this additional layer needs to be added.

  1. The documentation for NetTrain reads: "When specifying target outputs using the specification Subscript[port, i]->{Subscript[data, i1],Subscript[data, i2],[Ellipsis]}, any provided custom loss layers should take the Subscript[port, i] as inputs in order to compute the loss." Does this mean you do it in the second argument to NetGraph? Is it some sort of optional argument to the layer that is an entry point in the custom loss function? An example or two would definitely help.

  2. The documentation for NetTrain reads "When loss layers are automatically attached by NetTrain to output ports, their "Target" ports will be taken from the training data using the same name as the original output port." Honestly, I have read this numerous times and I still don't know what it means. This stuff is unavoidably complicated and is difficult to translate from programming language into English. Examples can really help clarify matters.

So, please, trust me. I've read the documentation. Maybe not perfectly, but quite a bit. And if a pretty experienced Wolfram Language user struggles mightily to figure it out -- and cares enough to write lengthy posts on the subject -- there is at least some possibility that the issue lies at least in part with documentation that needs improvement. I hear from Taliesin Beynon in another post that in fact there is work being done on that front. Good! I can only encourage you to try as hard as you can to imagine matters from a user's perspective. As the NeuralNetworks infrastructure matures please provide, as Wolfram generally does so well, examples and explanations that tackle the borderline and confusing cases in ways that bring clarity.

Note:

Here's an example of a documentation issue. In another post I suggested that one needed to use a function from the NeuralNetworkscontext in order to extract connectivity information from a NetGraph. A developer helpfully noted that one could instead use EdgeList from the System context to do so. Great. But the documentation for EdgeList indicates that it works on Graph objects, not on NetGraphs. And the documentation for NetGraph says nothing about use of EdgeList.

POSTED BY: Seth Chandler
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract