Message Boards Message Boards

0
|
4093 Views
|
2 Replies
|
0 Total Likes
View groups...
Share
Share this post:

Is feature preprocessing necessary for ML? How abt RecalibrationFunction?

Many machine learning practitioners talk about the need to transform the features & target variables, claiming this would boost the model accuracy.

Like this typical page elaborating about it in detail https://www.analyticsvidhya.com/blog/2020/07/types-of-feature-transformation-and-scaling/

I tried the techniques on several models using Wolfram V13. However, it didn't seem to result in any improvement to the model accuracy.

I noticed there is this RecalibrationFunction[] built-in to Classfy[] & Predict[] as a "post-processing" function, which would automatically correct overconfident or underconfident classifiers. The documentation doesn't explain much about how it is working in the background. Is this RecalibrationFunction[] in fact doing something similar to the data pre-processing, such as log transformation, scalar transformation?

Thanks

POSTED BY: Teck Boon Lim
2 Replies
Posted 2 years ago

For classifiers, the RecalibrationFunction tries to adjust the output probabilities so that among the samples that are given output probability of x% of belonging to class y, close to x% of them actually belong to class y. Take a look at here and here (these links give Python code but the idea of calibration is the same).

Feature preprocessing is a completely different thing that involves preprocessing (altering) the features (i.e., inputs) that you give to your models. For example, if your data contains categorical (nominal) data, for example sex, then it's best to convert it to a number before giving it to your model (e.g., 0 for female 1 for male). If you have continuous data with different ranges and scale, for example property prices and property age, then it's best to standardize them. The Classify and Predict functions of Mathematica do this automatically. Look at FeatureExtraction for more preprocessing/feature extraction options.

POSTED BY: Sepehr Elahi

Sepehr, Thanks so much for your explanation. I have a much better clarity now. Let me do some research about what you suggested and see if I could conclude more insights later. Thanks again.

POSTED BY: Teck Boon Lim
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract