Message Boards Message Boards

Use FeatureExtraction before splitting data into training and testing sets?


I have a data-set on which I want to run the Classify[] function.

To extract features I will make use of the FeatureExtraction[] function.

My question is: should I extract the features before I split the data into the training set and testing set or should I run FeatureExtraction[] for each set separately?

So, should I do this:

fe = FeatureExtraction[dataset]
classifier = Classify[trainSet -> targetTrain, FeatureExtractor -> fe]

or should I do this:

feTrain = FeatureExtraction[trainSet]
classifier = Classify[trainSet -> targetTrain, FeatureExtractor -> feTrain]]

I am inclined to think I should use the first approach but I am not quiet sure how FeatureExtraction[] works. But, if that is true, should I expect that the classifier will know how to extract features from the testing set?

POSTED BY: Sandu Ursu
4 months ago

Group Abstract Group Abstract