Message Boards Message Boards

Proper use of PredictorMeasurements[]?

GROUPS:

I've been playing around with Predict[] with multi-dimensional datasets and, for small training sets anyway, things seem to work correctly. For example,

trainingset = {<|"age" -> 47, "sex" -> "M", "height" -> 100, 
    "weight" -> 60|>, <|"age" -> 22, "sex" -> "M", "height" -> 90, 
    "weight" -> 55|>, <|"age" -> 43, "sex" -> "M", "height" -> 110, 
    "weight" -> 61|>, <|"age" -> 23, "sex" -> "F", "height" -> 100, 
    "weight" -> 41|>, <|"age" -> 33, "sex" -> "F", "height" -> 80, 
    "weight" -> 50|>, <|"age" -> 43, "sex" -> "F", "height" -> 70, 
    "weight" -> 51|>};
testset = {<|"age" -> 37, "sex" -> "M", "height" -> 100|>, <|
    "age" -> 22, "sex" -> "M", "height" -> 90|>, <|"age" -> 43, 
    "sex" -> "F", "height" -> 80|>, <|"age" -> 33, "sex" -> "F", 
    "height" -> 70|>};
p1 = Predict[trainingset -> "weight", PerformanceGoal -> "Quality", 
   Method -> "RandomForest"];

We can get predictions from the p1 PredictorFunction with

Map[Append[#, "prediction" -> p1[#]] &, testset] (* this works *)

I can then compute residuals, etc., myself.

Since version 10, Wolfram Language has included the function PredictorMeasurements[], and the documentation suggests that I should be able to get the predictions above, plus residual reports and other information, with

PredictorMeasurements[p1, testset]

But this does not work. I get the following error: PredictorMeasurements::bdfmt: Argument {<|age->37,sex->M,height->100,weight->60|>,<|age->22,sex->M,height->90|>,<|age->43,sex->F,height->80|>,<|age->33,sex->F,height->70|>} should be a rule or a list of rules.

What am I missing?

POSTED BY: Michael Stern
Answer
3 months ago

Your testset supposed to have "weight" information too, in the way the trainingset set has it, because with the testset your not only predicting but also testing against a known result. So I would setup your problem in the following way.

Have some general data:

data=Dataset@{
<|"age"->47,"sex"->"M","height"->100,"weight"->60|>,
<|"age"->22,"sex"->"M","height"->90,"weight"->55|>,
<|"age"->43,"sex"->"M","height"->110,"weight"->61|>,
<|"age"->23,"sex"->"F","height"->100,"weight"->41|>,
<|"age"->33,"sex"->"F","height"->80,"weight"->50|>,
<|"age"->43,"sex"->"F","height"->70,"weight"->51|>,
<|"age"->37,"sex"->"M","height"->100,"weight"->53|>,
<|"age"->22,"sex"->"M","height"->90,"weight"->51|>,
<|"age"->43,"sex"->"F","height"->80,"weight"->51|>,
<|"age"->33,"sex"->"F","height"->70,"weight"->52|>};

Split your data in test and training sets:

leng=Length[data];
split=Round[.6 leng] (* take 60% of your data for trainig *)
trainingset=data[;;split]
testset=data[split-leng;;]

Choose variable to predict and train:

p=Predict[trainingset->"weight",PerformanceGoal->"Quality",Method->"RandomForest"]

and setup the PredictorMeasurements in the same way:

pm = PredictorMeasurements[p, testset -> "weight"]

Now you can extract various measurements:

pm["ComparisonPlot"]

enter image description here

POSTED BY: Vitaliy Kaurov
Answer
2 months ago

Group Abstract Group Abstract