Group Abstract

Message Boards

WOLFRAM COMMUNITY

6.4K Views

3 Replies

5 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Best way to identify the best combination of variables in Predict?

M.A. Ghorbani

M.A. Ghorbani, University of Tabriz

Posted 4 years ago

Dear All, The goal is to achieve the highest accuracy for a specific combination of the input variables. For example, I only evaluated three different combinations of the input, while we have other combinations like y=f(x1,x4), y=f(x3,x4,x5), y=f(x3), and so on. As you know evaluation of all possible combinations is time-consuming. Is there any way to get the best combination of inputs? Any help would be greatly appreciated. y = {7.56, 3.79, 2.85, 8.47, 1.37, 5.16, 3.83, 6.58, 6.14, 5.82}; x1 = {1.7, 0.67, 0.5, 7.9, 5.5, 0.81, 6.9, 4.6, 8.2, 8.1}; x2 = {9.02, 0.85, 1.09, 3.37, 8.64, 6.72, 0.62, 7.12, 7.42, 2.03}; x3 = {2.69, 2.04, 3.12, 2.09, 0.89, 7.82, 7.56, 2.24, 7.25, 3.44}; x4 = {6.01, 2.73, 5.35, 7.33, 9.38, 9.94, 1.19, 5.05, 9.39, 8.86}; x5 = {0.84, 2.31, 4.42, 4.18, 8.46, 3.02, 9.09, 6.14, 4.10, 7.15}; (Scenario 1 : y=f(x1) ) tuples1 = Thread[Rule[Transpose[{x1}], y]]; train = Take[tuples1, 7]; test = Take[tuples1, -3]; cfunc = Predict[train, Method -> NeuralNetwork , PerformanceGoal -> Quality ]; predictOnTrained = Map[cfunc, train[[All, 1]]]; predictOnTest = Map[cfunc, test[[All, 1]]]; actualOnTrained = train[[All, 2]]; actualOnTest = test[[All, 2]]; RootMeanSquare[actualOnTest - predictOnTest]; Correlation[actualOnTest, predictOnTest]; (Scenario 2 : y=f(x1,x2) ) tuples2 = Thread[Rule[Transpose[{x1, x2}], y]]; train = Take[tuples2, 7]; test = Take[tuples2, -3]; cfunc = Predict[train, Method -> NeuralNetwork , PerformanceGoal -> Quality ]; predictOnTrained = Map[cfunc, train[[All, 1]]]; predictOnTest = Map[cfunc, test[[All, 1]]]; actualOnTrained = train[[All, 2]]; actualOnTest = test[[All, 2]]; RootMeanSquare[actualOnTest - predictOnTest] Correlation[actualOnTest, predictOnTest] (Scenario 3 : y=f(x1,x2,x3) ) tuples3 = Thread[Rule[Transpose[{x1, x2, x3}], y]]; train = Take[tuples3, 7]; test = Take[tuples3, -3]; cfunc = Predict[train, Method -> NeuralNetwork , PerformanceGoal -> Quality ]; predictOnTrained = Map[cfunc, train[[All, 1]]]; predictOnTest = Map[cfunc, test[[All, 1]]]; actualOnTrained = train[[All, 2]]; actualOnTest = test[[All, 2]]; RootMeanSquare[actualOnTest - predictOnTest] Correlation[actualOnTest, predictOnTest]

Dear All,

The goal is to achieve the highest accuracy for a specific combination of the input variables. For example, I only evaluated three different combinations of the input, while we have other combinations like y=f(x1,x4), y=f(x3,x4,x5), y=f(x3), and so on.

As you know evaluation of all possible combinations is time-consuming. Is there any way to get the best combination of inputs?

Any help would be greatly appreciated.

y = {7.56, 3.79, 2.85, 8.47, 1.37, 5.16, 3.83, 6.58, 6.14, 5.82};
x1 = {1.7, 0.67, 0.5, 7.9, 5.5, 0.81, 6.9, 4.6, 8.2, 8.1};
x2 = {9.02, 0.85, 1.09, 3.37, 8.64, 6.72, 0.62, 7.12, 7.42, 2.03};
x3 = {2.69, 2.04, 3.12, 2.09, 0.89, 7.82, 7.56, 2.24, 7.25, 3.44};
x4 = {6.01, 2.73, 5.35, 7.33, 9.38, 9.94, 1.19, 5.05, 9.39, 8.86};
x5 = {0.84, 2.31, 4.42, 4.18, 8.46, 3.02, 9.09, 6.14, 4.10, 7.15};

(*Scenario 1 : y=f(x1) *)

tuples1 = Thread[Rule[Transpose[{x1}], y]];

train = Take[tuples1, 7];

test = Take[tuples1, -3];

cfunc = Predict[train, Method ->  NeuralNetwork  , 
   PerformanceGoal ->  Quality  ];

predictOnTrained = Map[cfunc, train[[All, 1]]];

predictOnTest = Map[cfunc, test[[All, 1]]];

actualOnTrained = train[[All, 2]];

actualOnTest = test[[All, 2]];

RootMeanSquare[actualOnTest - predictOnTest];

Correlation[actualOnTest, predictOnTest];

(*Scenario 2 : y=f(x1,x2) *)

tuples2 = Thread[Rule[Transpose[{x1, x2}], y]];

train = Take[tuples2, 7];

test = Take[tuples2, -3];

cfunc = Predict[train, Method ->  NeuralNetwork  , 
   PerformanceGoal ->  Quality  ];

predictOnTrained = Map[cfunc, train[[All, 1]]];

predictOnTest = Map[cfunc, test[[All, 1]]];

actualOnTrained = train[[All, 2]];

actualOnTest = test[[All, 2]];

RootMeanSquare[actualOnTest - predictOnTest]

Correlation[actualOnTest, predictOnTest]

(*Scenario 3 : y=f(x1,x2,x3) *)

tuples3 = Thread[Rule[Transpose[{x1, x2, x3}], y]];

train = Take[tuples3, 7];

test = Take[tuples3, -3];

cfunc = Predict[train, Method ->  NeuralNetwork  , 
   PerformanceGoal ->  Quality  ];

predictOnTrained = Map[cfunc, train[[All, 1]]];

predictOnTest = Map[cfunc, test[[All, 1]]];

actualOnTrained = train[[All, 2]];

actualOnTest = test[[All, 2]];

RootMeanSquare[actualOnTest - predictOnTest]

Correlation[actualOnTest, predictOnTest]

POSTED BY: M.A. Ghorbani

3 Replies

Sort By:

Jim Baldwin

Posted 4 years ago

Selecting the model with the largest $R^2$ or equivalently the smallest root mean square error is not a good practice when you are comparing lots of models (especially with all-possible subsets regression) with different numbers of predictors. Why? Any random predictor added will always increase the $R^2$ value and decrease the root mean square.

POSTED BY: Jim Baldwin

Jim Baldwin

Posted 4 years ago

I know the text below doesn't answer your question about `Predict` but does address all-possible subsets regression. Also, the term "best" is not specific enough. For example, You might want to consider choosing the model with the smallest $AIC_c$ value rather than the largest $R^2$ value. And finally, you might want to consider model averaging where you don't choose a single model but rather a weighted average of a set of models. If your question was "How do I efficiently find the best linear regression of all possible subsets linear regression without evaluating every subset?", then there are several approaches. (Although, with even a moderate number of predictor variables, this can get out of hand easily.) One article to look at is Exact Variable-Subset Selection in Linear Regression for R. Alternatively, performing all-possible subsets is not often recommended statistical advice. You should consider following advice from Frank Harrell which is probably the best source. His book "Regression Modeling Strategies" and class notes are more than excellent.

POSTED BY: Jim Baldwin

M.A. Ghorbani

M.A. Ghorbani, University of Tabriz

Posted 4 years ago

Dear Jim, Thank you so much for the useful explains and for introducing the excellent references. Certainly, I will study them. I mean was using an iteration method for achieving the best combination based on the high correlation and low root mean square error. The program chooses the combinations themselves and gives us the best. This issue is very important in civil and environmental engineering and many other sciences . Again I appreciate your time.

POSTED BY: M.A. Ghorbani

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback