I have a data-set on which I want to run the Classify[]
function.
To extract features I will make use of the FeatureExtraction[]
function.
My question is: should I extract the features before I split the data into the training set and testing set or should I run FeatureExtraction[]
for each set separately?
So, should I do this:
fe = FeatureExtraction[dataset]
classifier = Classify[trainSet -> targetTrain, FeatureExtractor -> fe]
or should I do this:
feTrain = FeatureExtraction[trainSet]
classifier = Classify[trainSet -> targetTrain, FeatureExtractor -> feTrain]]
I am inclined to think I should use the first approach but I am not quiet sure how FeatureExtraction[]
works. But, if that is true, should I expect that the classifier will know how to extract features from the testing set?