Hello fellow Mathematica users,
I trust a lot Mathematica's machine learning framework, which automatizes many aspects of ML by default and let user enjoy an easy to use tool.
Lately, I realized that merely changing the order of features during training can significantly affect the predictive function, as shown in my example. Importantly, this change occurs without altering the names, types, values or values sequence of the features, only their order as a column. I even used "DirectTraining" in PerformanceGoal to prevent search of a model.
Why does the order of feature columns impact the creation of the PredictorFunction, if all columns have unique names and their values stay the same ? The learning dataset used to create the function doesn't change, apart its "column-wise" ordering.
Here is a simple code to let you test :
(* generate some data *)
x=RandomReal[100,100];
y=RandomReal[500,100];
z=x+y/2.+RandomVariate[NormalDistribution[]];
(* learning set with x column as first feature *)
assoXAndY=MapThread[Association["x"->#1,"y"->#2]->#3&,{x,y,z}];
(* learning set with y column as first feature *)
assoYAndX=MapThread[Association["y"->#2,"x"->#1]->#3&,{x,y,z}];
(* p learns {x,y} and p2 learns {y,x} *)
p=Predict[assoXAndY, Method->"RandomForest",PerformanceGoal->"DirectTraining",FeatureTypes->{"Numerical","Numerical"}];
p2=Predict[assoYAndX,Method->"RandomForest",PerformanceGoal->"DirectTraining",FeatureTypes->{"Numerical","Numerical"}];
(* computes differences between predicted values, given same features, just different column order *)
p[First/@assoXAndY]-p2[First/@assoXAndY]
Zero difference between the two predicted values are not that common.
Of course, playing with Methods yield some different behaviors for this dataset, for instance, NearestNeighbors gives the most zero differences between the two models.
Even using "key" -> value as argument for both predictors will probably output different values :
randX = RandomSample[x,1];
randY=RandomSample[y,1];
{p[<|"x"->randX,"y"->randY|>],p2[<|"x"->randX,"y"->randY|>]}
To sum up, behind the ease of use, it's quite surprising and hard to realize that such small changes during the elaboration of the Predictor could impact their outputs. I may consider exploring features order impact on Predictors using RandomSample or Permutations functions, at the cost of simplicity.
EDIT : Using FeatureExtractor->"Minimal" seems to suppress the feature order sensitivity of the learning.