Message Boards Message Boards

1
|
333 Views
|
0 Replies
|
1 Total Likes
View groups...
Share
Share this post:

Observation on Feature Order Sensitivity in Machine Learning Models

Posted 2 months ago

Hello fellow Mathematica users,

I trust a lot Mathematica's machine learning framework, which automatizes many aspects of ML by default and let user enjoy an easy to use tool.

Lately, I realized that merely changing the order of features during training can significantly affect the predictive function, as shown in my example. Importantly, this change occurs without altering the names, types, values or values sequence of the features, only their order as a column. I even used "DirectTraining" in PerformanceGoal to prevent search of a model.

Why does the order of feature columns impact the creation of the PredictorFunction, if all columns have unique names and their values stay the same ? The learning dataset used to create the function doesn't change, apart its "column-wise" ordering.

Here is a simple code to let you test :

(* generate some data *)
x=RandomReal[100,100];
y=RandomReal[500,100];
z=x+y/2.+RandomVariate[NormalDistribution[]];

(* learning set with x column as first feature *)
assoXAndY=MapThread[Association["x"->#1,"y"->#2]->#3&,{x,y,z}];

(* learning set with y column as first feature *)
assoYAndX=MapThread[Association["y"->#2,"x"->#1]->#3&,{x,y,z}];

(* p learns {x,y} and p2 learns {y,x} *)
p=Predict[assoXAndY, Method->"RandomForest",PerformanceGoal->"DirectTraining",FeatureTypes->{"Numerical","Numerical"}];
p2=Predict[assoYAndX,Method->"RandomForest",PerformanceGoal->"DirectTraining",FeatureTypes->{"Numerical","Numerical"}];

(* computes differences between predicted values, given same features, just different column order *)
p[First/@assoXAndY]-p2[First/@assoXAndY]

Zero difference between the two predicted values are not that common.

Of course, playing with Methods yield some different behaviors for this dataset, for instance, NearestNeighbors gives the most zero differences between the two models.

Even using "key" -> value as argument for both predictors will probably output different values :

randX = RandomSample[x,1];
randY=RandomSample[y,1];
{p[<|"x"->randX,"y"->randY|>],p2[<|"x"->randX,"y"->randY|>]}

To sum up, behind the ease of use, it's quite surprising and hard to realize that such small changes during the elaboration of the Predictor could impact their outputs. I may consider exploring features order impact on Predictors using RandomSample or Permutations functions, at the cost of simplicity.

EDIT : Using FeatureExtractor->"Minimal" seems to suppress the feature order sensitivity of the learning.

POSTED BY: Clarisse Wagner
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract