@Q Q: Here are two simple functions for n-fold and checking all methods (including a warning/finding).
1) Returning all the accuracies (and the mean) for random n-fold cross validation defined in my earlier answer:
(* Return all accuracies *)
crossValidationAllAccuracies[data_, folds_: 10, time_: Automatic, performance_: Automatic] := Module[{accuracy, cv},
accuracy = Monitor[ Table[cv = crossValidation1[data, folds, time, performance], {i, 1, folds}], {i, cv}];
{Mean[accuracy], accuracy}
]
Example:
crossValidationAllAccuracies[data, 10, Automatic, "Quality"]
which gave this result when I ran it:
{0.723377, {0.727273, 0.753247, 0.727273, 0.623377, 0.727273, 0.727273, 0.779221, 0.675325, 0.805195, 0.688312}}
2) Testing all methods.
One should first note that Classify
do check many (all?) methods automatically, so it's better to let Classify
run to get the best model. The option ValidationSet
can be set to Automatic
(or to a specific test dataset) which will then use a validation set.
Also, a thing I noticed when testing this is that when Classify
runs with Method->Automatic
it seems that it set the hyper parameters much better than with an explicit method. Here is an example of this. First we let Classify
run with Method->Automatic
(the default):
cl = Classify[dataTrain, Method -> Automatic, PerformanceGoal -> "Quality"]
ClassifierMeasurements[cl, dataTest, "Accuracy"]
The method chosen is GradientBoostedTrees with an accuracy of 0.727273. Then we set Method->"GradientBoostedTrees"
explicitly:
cl = Classify[dataTrain, **Method -> "GradientBoostedTrees"**, PerformanceGoal -> "Quality"]
ClassifierMeasurements[cl, dataTest, "Accuracy"]
which give a (much) lower accuracy of 0.688312. This is a bit surprising. (I think there was an issue about this, either here at Wolfram Community or on StackOverflow Mathematica group. However, I cannot find it now.)
That said, for demonstration purposes, here is code for explicit testing all methods, but be aware of the problem mentioned above. Also, to simplify it, I have not included the cross validation, so it just test on a single dataset.
(* Testing all methods. *)
testAll[trainData_, testData_, time_, performance_] := Module[{methods},
methods = {"DecisionTree", "GradientBoostedTrees", "LogisticRegression", "Markov", "NaiveBayes", "NearestNeighbors",
"NeuralNetwork", "PriorBaseline", "RandomForest", "SupportVectorMachine", Automatic};
Association[# -> ClassifierMeasurements[Classify[trainData, Method -> #, PerformanceGoal -> performance, TimeGoal -> time, TrainingProgressReporting -> None],
testData, "Accuracy"] & /@ methods]
]
Example:
AbsoluteTiming[testAll[dataTrain, dataTest, Automatic, "Quality"]]
Result:
{173.114, <|"DecisionTree" -> 0.597403, "GradientBoostedTrees" -> 0.688312, "LogisticRegression" -> 0.688312,
"Markov" -> 0.662338, "NaiveBayes" -> 0.688312, "NearestNeighbors" -> 0.623377, "NeuralNetwork" -> 0.623377,
"PriorBaseline" -> 0.584416, "RandomForest" -> 0.597403, "SupportVectorMachine" -> 0.688312,
Automatic -> 0.688312|>}
Again, we see that "GradientBoostedTrees" has this quite low accuracy.