Group Abstract

Message Boards

WOLFRAM COMMUNITY

6.4K Views

2 Replies

0 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Data Science Image Processing Wolfram Language Statistics and Probability Machine Learning Neural Networks

Proper ValidationSet use in Classify?

Jeff Burns

Posted 9 years ago

I am using Classify[] to build a classifier for images, and am not sure about the proper use of the ValidationSet options. My understanding is that without the ValidationSet option, cross correlation will be used. I do not think this is stated explicitly in the documentation, but I have read it online someplace. First, I divide my data equally into training and test sets. enterSeedRandom[500]; forTestSet = EvenQ@Range[Length@data]; forTrainingSet = Map[Not[#] &, forTestSet]; testSetClasses = Pick[Normal@data[All, "Type"], forTestSet]; trainingSetClasses = Pick[Normal@data[All, "Type"], forTrainingSet]; testSetImagesRGB = Pick[Normal@data[All, "Image"], forTestSet]; trainingSetImagesRGB = Pick[Normal@data[All, "Image"], forTrainingSet]; I use one of these two statements to set up the classifier: imageClassifyerRGB = Classify[trainingSetImagesRGB -> imageTrainingSetClass, PerformanceGoal -> "Quality", Method -> "NeuralNetwork", ValidationSet -> MapThread[Rule[#1 , #2] &, {testSetImagesRGB, testSetClasses}]] imageClassifyerRGB = Classify[trainingSetImagesRGB -> imageTrainingSetClass, PerformanceGoal -> "Quality", Method -> "NeuralNetwork"] I am judging the accuracy with: ClassifierMeasurements[ imageClassifyerRGB, testSetImagesRGB -> testSetClasses, "Accuracy"] Should I be using the ValidationSet option this way, or will this result in over fitting my data?

I am using Classify[] to build a classifier for images, and am not sure about the proper use of the ValidationSet options. My understanding is that without the ValidationSet option, cross correlation will be used. I do not think this is stated explicitly in the documentation, but I have read it online someplace.

First, I divide my data equally into training and test sets.

enterSeedRandom[500];
forTestSet = EvenQ@Range[Length@data];
forTrainingSet = Map[Not[#] &, forTestSet];
testSetClasses = Pick[Normal@data[All, "Type"], forTestSet];
trainingSetClasses =  Pick[Normal@data[All, "Type"], forTrainingSet];
testSetImagesRGB = Pick[Normal@data[All, "Image"], forTestSet];
trainingSetImagesRGB =  Pick[Normal@data[All, "Image"], forTrainingSet];

I use one of these two statements to set up the classifier:

imageClassifyerRGB = 
 Classify[trainingSetImagesRGB -> imageTrainingSetClass,
  PerformanceGoal -> "Quality",
  Method -> "NeuralNetwork",
  ValidationSet -> 
   MapThread[Rule[#1 , #2] &, {testSetImagesRGB, testSetClasses}]]

imageClassifyerRGB = 
 Classify[trainingSetImagesRGB -> imageTrainingSetClass,
  PerformanceGoal -> "Quality",
  Method -> "NeuralNetwork"]

I am judging the accuracy with:

ClassifierMeasurements[ imageClassifyerRGB, 
 testSetImagesRGB -> testSetClasses, "Accuracy"]

Should I be using the ValidationSet option this way, or will this result in over fitting my data?

POSTED BY: Jeff Burns

2 Replies

Sort By:

Claude Mante

Claude Mante, Retired

Posted 9 years ago

As far as I know, Classify (and machine learning methods in general) are well-suited for supervised classification: the nature of several region is known, and they are used as training sets. So, using random drawn for that purpose is strange. For unsupervised classification, ClusteringComponents is better suited, in my opinion.

POSTED BY: Claude Mante

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 9 years ago

I don't know the internals and I would be interested to see what people more knowledgeable have to say. But the first setup you show would make me very nervous about an overfit, since the validation set is used in the creation of the classifier and is also used as the test set. I would expect that, in oder to get trustable results, the three sets (training, validation, testing) would need to be all nonoverlapping.

POSTED BY: Daniel Lichtblau

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback