I am using Classify[] to build a classifier for images, and am not sure about the proper use of the ValidationSet options. My understanding is that without the ValidationSet option, cross correlation will be used. I do not think this is stated explicitly in the documentation, but I have read it online someplace.
First, I divide my data equally into training and test sets.
enterSeedRandom[500];
forTestSet = EvenQ@Range[Length@data];
forTrainingSet = Map[Not[#] &, forTestSet];
testSetClasses = Pick[Normal@data[All, "Type"], forTestSet];
trainingSetClasses = Pick[Normal@data[All, "Type"], forTrainingSet];
testSetImagesRGB = Pick[Normal@data[All, "Image"], forTestSet];
trainingSetImagesRGB = Pick[Normal@data[All, "Image"], forTrainingSet];
I use one of these two statements to set up the classifier:
imageClassifyerRGB =
Classify[trainingSetImagesRGB -> imageTrainingSetClass,
PerformanceGoal -> "Quality",
Method -> "NeuralNetwork",
ValidationSet ->
MapThread[Rule[#1 , #2] &, {testSetImagesRGB, testSetClasses}]]
imageClassifyerRGB =
Classify[trainingSetImagesRGB -> imageTrainingSetClass,
PerformanceGoal -> "Quality",
Method -> "NeuralNetwork"]
I am judging the accuracy with:
ClassifierMeasurements[ imageClassifyerRGB,
testSetImagesRGB -> testSetClasses, "Accuracy"]
Should I be using the ValidationSet option this way, or will this result in over fitting my data?