Message Boards Message Boards

2
|
3553 Views
|
4 Replies
|
14 Total Likes
View groups...
Share
Share this post:

Creating NN with car dataset?

Posted 2 years ago

Hello, I want to create a classifying neural network based on a Car dataset. I looked at the example with the Titanic dataset, but this is not exactly what I need. I need to build a neural network not through the NetGraph, but the NetChain, in order to enter data for predict not like this: trained [<| "class" -> "2nd", "age" -> 24, "sex" -> "female" |> ], but just a vector, for example, trained[{"2nd", "24", "female"}].

I tried to do it myself, but nothing works. My code:

     data =  Import[
       "<directory>\\car.data", "Data"];
    x = Flatten[data[[All, 1 ;; 6]]];
    y = Flatten[data[[All, 7 ;; 7]]];

    buyingEncoder = NetEncoder[{"Class", {"vhigh", "high", "med", "low"}, "UnitVector"}]
    maintEncoder = NetEncoder[{"Class", {"vhigh", "high", "med", "low"}, "UnitVector"}]
    doorsEncoder = NetEncoder[{"Class", {"2", "3", "4", "5more"}, "UnitVector"}]
    personsEncoder = NetEncoder[{"Class", {"2", "4", "more"}, "UnitVector"}]
    lugBootEncoder = NetEncoder[{"Class", {"small", "med", "big"}, "UnitVector"}]
    safetyEncoder =  NetEncoder[{"Class", {"low", "med", "high"}, "UnitVector"}]

    net = NetChain[{LinearLayer[30], ElementwiseLayer[Ramp], LinearLayer[4], SoftmaxLayer[]}, "Input" -> {21}, "Output" -> NetDecoder[{"Class", {"acc", "good", "unacc", "vgood"}}]]
    NetTrain[net, x -> y, All]

Please help me figure it out. Thanks.

POSTED BY: Sofia Knyazeva
4 Replies
Posted 2 years ago

Hi Sofia,

You are not using the encoders anywhere in the NN. Is there a reason why you cannot use the built-in Classify?

data = Import["~/Downloads/car.data", "Data"];
dataRules = Most@# -> Last@# & /@ data;
{train, test} = ResourceFunction["TrainTestSplit"][dataRules];

classifier = Classify[train]
cm = ClassifierMeasurements[classifier, test]

classifier[{"low", "low", 4, 4, "big", "high"}]
(* "vgood" *)
POSTED BY: Rohit Namjoshi

Let me build on your suggestion by comparing Classify and NetTrain on this problem. I will first import the dataset as "RawData" to avoid the automatic conversion of stuff like "2" from string to integer (just to avoid dealing with mixed data types).

data = Import["/Users/giulio/Downloads/car.data", "RawData"];

Then we split the data before the conversion to rules so we can use it for the deep learning training as well.

SeedRandom[1234];
{trainingData, validationData} = ResourceFunction["TrainTestSplit"][data];
classifierTraining = Most@# -> Last@# & /@ trainingData;
classifierTesting = Most@# -> Last@# & /@ validationData;

Now we can train the classifier as suggested by @Rohit Namjoshi

classifier = Classify[classifierTraining];
cm = ClassifierMeasurements[classifier, classifierTesting]

which gives us

cm["Accuracy"]
(* 0.849711 *)

which is not super high. Using the same one hot encoding as proposed by the OP we get a better result

classifier2 = Classify[classifierTraining, FeatureExtractor -> "IndicatorVector"];
cm2 = ClassifierMeasurements[classifier2, classifierTesting];    
cm2["Accuracy"]
(* 0.942197 *)

Now let's see how this can be done with a NN model. The definition of the class encoders was ok

buyingEncoder = NetEncoder[{"Class", {"vhigh", "high", "med", "low"}, "UnitVector"}];
maintEncoder = buyingEncoder;
doorsEncoder = NetEncoder[{"Class", {"2", "3", "4", "5more"}, "UnitVector"}];
personsEncoder = NetEncoder[{"Class", {"2", "4", "more"}, "UnitVector"}];
lugBootEncoder = NetEncoder[{"Class", {"small", "med", "big"}, "UnitVector"}];
safetyEncoder = NetEncoder[{"Class", {"low", "med", "high"}, "UnitVector"}];

Unfortunately NetChain does not support multiple inputs so we need to create a graph

net = NetInitialize@NetGraph[
   {CatenateLayer[], LinearLayer[30], ElementwiseLayer[Ramp], LinearLayer[4], SoftmaxLayer[]},
   {
    (NetPort /@ {"Buying", "MainT", "Doors", "Persons", "LugBoot", "Safety"}) -> 
     1 -> 2 -> 3 -> 4 -> 5
    },
   "Buying" -> buyingEncoder,
   "MainT" -> maintEncoder,
   "Doors" -> doorsEncoder,
   "Persons" -> personsEncoder,
   "LugBoot" -> lugBootEncoder,
   "Safety" -> safetyEncoder,
   "Output" -> NetDecoder[{"Class", {"acc", "good", "unacc", "vgood"}}]
   ]

You can still use it on untagged data

net[{"vhigh", "vhigh", "2", "2", "small", "low"}]
(* "vgood" *)

For the training though, we need to put the data in a format that plays well with the different network ports

{netTraining, netValidation} = AssociationThread[
     {"Buying", "MainT", "Doors", "Persons", "LugBoot", "Safety", "Output"},
     Transpose[#]
     ] & /@ {trainingData, validationData};

Now we can train, stopping when the error on the validation set stops improving (I noticed that the loss can go much lower but you don't get any concrete benefit and you might start overfitting the small validation set)

trainingRes = NetTrain[net, netTraining, All, ValidationSet -> netValidation,
  TrainingStoppingCriterion -> <|"Criterion" -> "ErrorRate", "Patience" -> 100|>]

The trained model can now be extracted and used

trainedNet = trainingRes["TrainedNet"];
trainedNet[{"vhigh", "vhigh", "2", "2", "small", "low"}]
(* "unacc" *)

It also gets even higher accuracy than the second classifier

cmNet = ClassifierMeasurements[trainedNet, classifierTesting];
cmNet["Accuracy"]
(* 0.991329 *)

OK now that all is said and done we can go back to Classify again and produce something of similar accuracy without all the network construction effort

classifier3 = 
 Classify[classifierTraining, FeatureExtractor -> "IndicatorVector", 
  Method -> {"NeuralNetwork", "NetworkDepth" -> 2, MaxTrainingRounds -> 200}]
cm3 = ClassifierMeasurements[classifier3, classifierTesting];
cm3["Accuracy"]
(* 0.99422 *)
Posted 2 years ago

Giulio Alessandrini, hello, thanks a lot for your help! Sorry for the late reply. I will study your code, thanks for the detailed comments.

POSTED BY: Sofia Knyazeva
Posted 2 years ago

Rohit Namjoshi, hello, thanks for your help! Sorry for the late reply. Of course, built-in Classify is good, but I'm learning how to create neural networks, so it didn't suit me.

POSTED BY: Sofia Knyazeva
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract