Hello everybody!!
I'm not sure if this should be a question or an idea, I think it depends if what i'm looking for does already exsists or not.. Anyway I have been trying to implement a model which is currently unvailable in the Wolfram Language: Extreme Learning Machines.
They are conceptually simple models which follow these steps:
Standardize data
Randomly project the data in a high dimensional space (and apply a non-linear function)
Find optimal weights that predicts the quantity of interest (as LinearRegression in Predict or LogisticRegression in Classify).
In order to implement them in a quick and integrated way (so that i could have a predictor function) i have used the following code:
train = ResourceData["Sample Data: Boston Homes", "TrainingData"];
test = ResourceData["Sample Data: Boston Homes", "TestData"];
numFeats = Length@Keys@train[[1]]
w = RandomReal[{-1, 1}, {100, numFeats}];
randomProject[data_] := Tanh[w.data]
elm = Predict[train, FeatureExtractor -> {Standardize, randomProject},
Method -> "LinearRegression", PerformanceGoal -> "TrainingSpeed"]
The code works (and actually also have good performances) but I have a couple of issues:
1) can I automatize the inference of the value "numFeats" when i pass it as a FeatureExtractor?
2) How can I initialize the random matrix w
so that I don't have to define it as a global variable?
3) Does Standardize actually compute the mean and variance on the train set and apply them on the test set? Or is it just standardizing a single List of data? If it is, how can we explicitly pass a Extractor that learn the parameter from the data (in this case just mean and variance)? I suspect that is not doing what i'm thinking because performances are better without the Standardize step.
4) I know that Classify follow an internal random seach in order to find the best hyperparameters set. My provocative question is: is it possible to pass in its search space a parameter from the featureExtractor? (In this case for example the dimension of the final projecting space that have been arbitrarily set to 100)
-- (PS: It has been really exciting to implement all of this in such a succint and clean way)