Message Boards Message Boards

Custom Neural Network Architectures for Social Science Data

The code in the notebook attached to this post sets forth my efforts to develop custom neural network architectures to work on datasets found in social sciences (or other fields). It is the result of a lot of trial and even more error. It shows how to do the following things. Some of this is covered in the Wolfram Language documentation, but not as an extensive worked example.

  1. Create numerical vectors out of nominal data
  2. Develop a loss function when the target consists of nominal data
  3. Use ClassifierMeasurements when the classifier is the output of a trained neural network
  4. Specify what form the neural network must be in for ClassifierMeasurements to work and how to modify a non-conforming trained network to be in the appropriate form.
  5. Show equivalent NetChains and NetGraphs
  6. Show how the neural network can itself encode nominal data contains as values to Associations, catenate that data and then pipe it through the rest of a NetGraph.
  7. Show how to hook up a loss function to the output of a neural network
  8. How to see the innards of a neural network more clearly, as well as the plan to convert it to something useable by MXNet.
  9. How to work with Datasets and Query.

I strongly suspect that this is not the most efficient way create a neural network to analyze data contained in a Dataset with named columns and lots of nominal variable variables. However, it's the best I can do for now. I hope it is instructive to others. More importantly perhaps, I hope that these efforts will inspire others more knowledgeable in the field to show (1) how this can all be done in a more efficient manner and (2) how other bells and whistles can be added, such as a custom loss function, weighted inputs, desired distribution of predictions, etc.. While the Wolfram documentation on neural networks is extensive, as of version 11.3, in which the functionality is still deemed "Experimental," it lacks, in my view, the conceptual perspective and range of worked examples from diverse fields that I think would lower the desired barriers to entry for non-expert users of machine learning.

Note:

I did receive some excellent assistance in this effort from Wolfram Technical Support, but there comes a point when you kind of want to do it on your own. My efforts in asking the community.wolfram.com website for assistance didn't receive any immediate response and so, being the persistent sort, I decided just to try and do it on my own.

Do the Encoding Before We Get to NetTrain

Download the Titanic and convert it from a Dataset to a list of associations.

Short[titanic = Normal@ExampleData[{"Dataset", "Titanic"}]]

Scramble the data, delete the missing values to keep things simple, and encode survival in a way I prefer.

titanic2 = 
  Query[RandomSample /* (DeleteMissing[#, 1, 
        2] &), {"survived" -> (If[#, "survived", "died"] &)}][titanic];

Encode the nominal data as unit vectors.

titanic3 = 
 Query[All, 
   List["class" -> NetEncoder[{"Class", {"1st", "2nd", "3rd"}, "UnitVector"}],
     "sex" -> NetEncoder[{"Class", {"male", "female"}, "UnitVector"}]]][
  titanic2]

enter image description here

Get the data as a list of six values ruled onto a single value.

Short[titanic4 = Query[All, Values /* (Flatten[Most[#]] -> Last[#] &)][titanic3]]

enter image description here

Form training and testing data sets.

Short[{trainingData, testData} =  TakeDrop[titanic4, Round[0.7*Length[titanic4]]]]

enter image description here

Create a pretty basic net chain ending with a SoftmaxLayer[] that turns the output into probabilities.

chainlinks = {LinearLayer[12], ElementwiseLayer[LogisticSigmoid],  LinearLayer[4], LinearLayer[2], SoftmaxLayer[]};

nc = NetChain[chainlinks, "Input" -> 6,  "Output" -> NetDecoder[{"Class", {"died", "survived"}}]]

enter image description here

Just test the NetChain to see if it works.

NetInitialize[nc][{0, 0, 1, 18, 1, 0}]

"died"

Train the neural net. Use the CrossEntropy loss as the function to minimize. Remember that the target data needs to be encoded from died and survived to 1 and 2. Otherwise the CrossEntropyLossLayer gets unhappy. After 2000 rounds I find it's all overfitting anyway. So I limit the training rounds.

chainTrained = NetTrain[nc, trainingData, All, ValidationSet -> Scaled[0.2],  LossFunction ->  CrossEntropyLossLayer["Index", 
        "Target" -> NetEncoder[{"Class", {"died", "survived"}}]],  MaxTrainingRounds -> 2000]

enter image description here

Get the TrainedNet out of the NetTrainResultsObject and see how our classifier performed.

cmo = ClassifierMeasurements[chainTrained["TrainedNet"], testData]

enter image description here

cmo["ConfusionMatrixPlot"]

enter image description here

Not bad. (But not great. The question is whether that's the fault of the classifier or just irreducible noise in the data).

Now do it with NetGraph

Same data, but do it with a NetGraph.

ngt = NetGraph[chainlinks, {1 -> 2 -> 3 -> 4 -> 5}, "Input" -> 6, 
  "Output" -> NetDecoder[{"Class", {"died", "survived"}}]]

enter image description here

From here on in, it' s all exactly the same.

graphTrained = 
 NetTrain[ngt, trainingData, All, ValidationSet -> Scaled[0.2], 
  LossFunction -> 
   CrossEntropyLossLayer["Index", 
    "Target" -> NetEncoder[{"Class", {"died", "survived"}}]], 
  MaxTrainingRounds -> 2000]

enter image description here

graphCmo = ClassifierMeasurements[graphTrained["TrainedNet"], testData]

enter image description here

graphCmo["ConfusionMatrixPlot"]

enter image description here

Not surprisingly, the results are very similar.

Now Do the Encoding Within NetTrain

Now, I want to do it with the data in a different form. I want the neural network to do the encoding. And I want to at least think about having a custom loss function. Convert the form of the data so that it is "column oriented." Basically we are going to use the third variant in the function specification set forth below.

enter image description here

{trainingData2, testData2} =  Map[Normal[Transpose[Dataset[#]]] &, TakeDrop[titanic2, Round[0.7*Length[titanic2]]]];

Here' s what the training Data looks like.

Keys[trainingData2]

{"class", "age", "sex", "survived"}

Map[Short, Values[trainingData2]]

enter image description here

Now form a NetGraph that Catenates some of the values from the data together and then goes through the same process as our NetChain (and NetGraph) above. Add a loss function at the end. Note that the data coming in from the Target port into the "myloss" layer is encoded from nominal values died and survived into integers 1 and 2.

nodes = Association["catenate" -> CatenateLayer[], "l15" -> LinearLayer[15], 
   "ls1" -> ElementwiseLayer[LogisticSigmoid], "l5" -> LinearLayer[5], 
   "l2" -> LinearLayer[2], "sm" -> SoftmaxLayer[], 
   "myloss" -> 
    CrossEntropyLossLayer["Index", 
     "Target" -> NetEncoder[{"Class", {"died", "survived"}}]]];

Create the connectivity structure between the nodes. Note that I am careful to specify which connectors of various NetPorts connect with other NetPort connectors. Certain Layers, like CrossEntropyLossLayer have connector names that the user can't alter so far as I can figure out. The connector name "Target" for example, needs to stay "Target." Also notice that I believe I have to generate a NetPort["Loss"] for the network to be trained.

connectivity = {{NetPort["class"], NetPort["age"], NetPort["sex"]} -> 
"catenate", "catenate" -> "l15" -> "ls1" -> "l5" -> "l2" -> "sm", 
"sm" -> NetPort["myloss", "Input"], 
NetPort["survived"] -> NetPort["myloss", "Target"], 
"myloss" -> NetPort["Loss"], "sm" -> NetPort["Output"]}
  {{NetPort["class"], NetPort["age"], NetPort["sex"]} -> "catenate",  
  "catenate" -> "l15" -> "ls1" -> "l5" -> "l2" -> "sm",   "sm" ->
  NetPort["myloss", "Input"],   NetPort["survived"] -> NetPort["myloss",
  "Target"],   "myloss" -> NetPort["Loss"], "sm" -> NetPort["Output"]}

Now let' s put our NetGraph together. Here I have to tell it how various inputs and outputs will be encoded and decoded. You will notice I do NOT tell it how to encode the "survived" values because our CrossEntropyLossLayer handles that part of the work.

ngt2 = NetGraph[nodes, connectivity, 
"class" -> NetEncoder[{"Class", {"1st", "2nd", "3rd"}, "UnitVector"}], 
"age" -> "Scalar", 
"sex" -> NetEncoder[{"Class", {"male", "female"}, "UnitVector"}], 
"Output" -> NetDecoder[{"Class", {"died", "survived"}}]]

enter image description here

Here' s a picture of our net.

NetInformation[ngt2, "FullSummaryGraphic"]

enter image description here

We can get the structure information back out of the NetGraph using some "secret" functions. I found these useful when working on this project to help me understand what was going on.

NeuralNetworks`GetNodes[ngt2]

enter image description here

NeuralNetworks`NetGraphEdges[ngt2] (* shouldn't this be called GetEdges for consistency??*; or maybe GetNodes should be NetGraphNodes??*)
{NetPort["class"] -> NetPort[{"catenate", 1}], 
NetPort["age"] -> NetPort[{"catenate", 2}], 
NetPort["sex"] -> NetPort[{"catenate", 3}], 
NetPort["survived"] -> NetPort[{"myloss", "Target"}], 
NetPort[{"catenate", "Output"}] -> "l15", "l15" -> "ls1", "ls1" -> "l5", 
"l5" -> "l2", "l2" -> "sm", "sm" -> NetPort[{"myloss", "Input"}], 
"sm" -> NetPort["Output"], NetPort[{"myloss", "Loss"}] -> NetPort["Loss"]}

We can also get a closer look at what the neural net is going to do, athough, frankly, I don' t understand the diagram fully. It does look cool, though. (I believe the diagram essentially shows how the Wolfram Language framework will be translated to MXNet).

NeuralNetworksNetPlanPlot[NeuralNetworksToNetPlan[ngt2]]

enter image description here

Anyway, let' s train the network. Notice how I designate the loss function with a string that refers to a node (NetPort) in the network. I'm not quite sure why, but you can't designate the loss function as "myloss"; again, I wish the documentation were clearer on this issue. Again, I'll stop after 2000 rounds.

titanicNet2 = 
NetTrain[ngt2, trainingData2, All, LossFunction -> "Loss", 
ValidationSet -> Scaled[0.2], MaxTrainingRounds -> 2000]

enter image description here

We can extract the trained network from the NetTrainResultsObject.

titanicNet2["TrainedNet"]

enter image description here

Let' s run it on the test data.

Short[titanicNet2["TrainedNet"][testData2]]

![enter image description here][20]

If I try to use ClassifierMeasurements on it, though, it fails.

ClassifierMeasurements[titanicNet2["TrainedNet"], testData2];

enter image description here

The error message is unhelpful. And nothing I could find in the documentation spells out the circumstances under which the results of a neural network can be used in ClassifierMeasurements. Maybe, however, it's because our NetGraph is producing two outputs: a Loss value and an Output value. When we make Classifiers, to be the best of my knowledge, we only get an Output value. Let's trim the network.

titanicNet2Trimmed = NetDelete[titanicNet2["TrainedNet"], "myloss"]

enter image description here

Now, when we run our trimmed network on the test data (stripped of the survived column), we just get output values as a List and not as part of a multi-key Association.

titanicNet2Trimmed[Query[KeyDrop["survived"]][testData2]]

enter image description here

And now ClassifierMeasurements works!!

cmo2 = ClassifierMeasurements[titanicNet2Trimmed, testData2 -> "survived"]

enter image description here

cmo2["ConfusionMatrixPlot"]

enter image description here

Conclusion

I hope the code above helps others in appreciating the incredible neural network functionality built into the Wolfram language and inspires further posts on how it can be used in creative and flexible ways.

Attachments:
POSTED BY: Seth Chandler
2 Replies

enter image description here - Congratulations! This post is now a Staff Pick as distinguished by a badge on your profile! Thank you, keep it coming!

POSTED BY: Moderation Team

Note that you don't need any secret internal functions to get the layer association or the edge list for the NetGraph. This will return the association of layers: NetExtract[ngt2, All] And this will return the connectivity information: EdgeList[ngt2]

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract