Group Abstract

Message Boards

5.6K Views

2 Replies

2 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Data Science Wolfram Language Machine Learning

Posted 3 years ago

Many machine learning practitioners talk about the need to transform the features & target variables, claiming this would boost the model accuracy. Like this typical page elaborating about it in detail https://www.analyticsvidhya.com/blog/2020/07/types-of-feature-transformation-and-scaling/ I tried the techniques on several models using Wolfram V13. However, it didn't seem to result in any improvement to the model accuracy. I noticed there is this RecalibrationFunction[] built-in to Classfy[] & Predict[] as a "post-processing" function, which would automatically correct overconfident or underconfident classifiers. The documentation doesn't explain much about how it is working in the background. Is this RecalibrationFunction[] in fact doing something similar to the data pre-processing, such as log transformation, scalar transformation? Thanks

POSTED BY: Teck Boon Lim

2 Replies

Sort By:

Posted 3 years ago

Sepehr, Thanks so much for your explanation. I have a much better clarity now. Let me do some research about what you suggested and see if I could conclude more insights later. Thanks again.

POSTED BY: Teck Boon Lim

Posted 3 years ago

For classifiers, the RecalibrationFunction tries to adjust the output probabilities so that among the samples that are given output probability of x% of belonging to class y, close to x% of them actually belong to class y. Take a look at here and here (these links give Python code but the idea of calibration is the same). Feature preprocessing is a completely different thing that involves preprocessing (altering) the features (i.e., inputs) that you give to your models. For example, if your data contains categorical (nominal) data, for example sex, then it's best to convert it to a number before giving it to your model (e.g., 0 for female 1 for male). If you have continuous data with different ranges and scale, for example property prices and property age, then it's best to standardize them. The Classify and Predict functions of Mathematica do this automatically. Look at FeatureExtraction for more preprocessing/feature extraction options.

POSTED BY: Sepehr Elahi

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback