Group Abstract Group Abstract

Message Boards Message Boards

2
|
8.6K Views
|
7 Replies
|
10 Total Likes
View groups...
Share
Share this post:

Random selection of elements for getting a high correlation

Hi,

How do I randomly select 12 elements of data for the training list to obtain the highest correlation coefficient for the testing list (CCtest)?

data = {{0.`, 0.048}, {0.2`, 0.424}, {0.4`, 0.943}, {0.60, 1.177}, {0.8`, 
    1.475}, {1.`, 1.839}, {1.200, 2.134}, {1.400, 2.395}, {1.6`, 
    2.564}, {1.8`, 2.814}, {2.`, 2.981}, {2.2`, 2.972}, {2.400, 3.133}, {2.6`,
     2.99}, {2.80, 3.190}, {3.`, 3.184}};

train = Take[data, 12]

{{0., 0.048}, {0.2, 0.424}, {0.4, 0.943}, {0.6, 1.177}, {0.8, 1.475}, {1., 
  1.839}, {1.2, 2.134}, {1.4, 2.395}, {1.6, 2.564}, {1.8, 2.814}, {2., 
  2.981}, {2.2, 2.972}}

lm = LinearModelFit[train, x, x];

gg[x_, y_] := 0.308+1.36x

YPtrain = Map[gg[#[[1]], #[[2]]] &, train]

{0.308, 0.58, 0.852, 1.124, 1.396, 1.668, 1.94, 2.212, 2.484, 2.756, 3.028, \
3.3}

Ytrain = train[[All, 2]]

{0.048, 0.424, 0.943, 1.177, 1.475, 1.839, 2.134, 2.395, 2.564, 2.814, 2.981, \
2.972}

CCtrain = Correlation[YPtrain, Ytrain]

0.985044

Evaluation of  0.308+1.36 x  for the testing data:

test = Take[data, -4]

{{2.4, 3.133}, {2.6, 2.99}, {2.8, 3.19}, {3., 3.184}}

YPtest = Map[gg[#[[1]], #[[2]]] &, test]

{3.572, 3.844, 4.116, 4.388}

Ytest = test[[All, 2]]

{3.133, 2.99, 3.19, 3.184}

CCtest = Correlation[YPtest, Ytest]

0.489591
POSTED BY: M.A. Ghorbani
7 Replies

Hi M.A. Ghorbani,

You want to do something like this?

SortBy[RandomSample[data, 12], First]
POSTED BY: Claudio Chaib

Yes, it can be done like this:

(In that case, I maximize the "CCtest")

corrN[t1_, t2_, n_] := 
 Module[{train, lm, test, cx, gg, YPtrain, Ytrain, CCtrain, YPtest, 
   Ytest, CCtest, x}, 
  MaximalBy[
   Table[train = SortBy[RandomSample[data, t1], First]; 
    lm = LinearModelFit[train, x, x]; 
    cx = CoefficientList[Normal@lm, x]; 
    gg[x_, y_] := (cx[[1]] + cx[[2]]*x); 
    YPtrain = Map[gg[#[[1]], #[[2]]] &, train]; 
    Ytrain = train[[All, 2]]; CCtrain = Correlation[YPtrain, Ytrain]; 
    test = SortBy[RandomSample[train, t2], First]; 
    YPtest = Map[gg[#[[1]], #[[2]]] &, test]; Ytest = test[[All, 2]]; 
    CCtest = 
     Correlation[YPtest, Ytest]; {Style[CCtest, Red] -> 
      Style["CCtest", Red], 
     Style[CCtrain, Blue] -> Style["CCtrain", Blue], {"YPtrain", 
      YPtrain}, {"Ytrain", Ytrain}, {"YPtest", YPtest}, {"YTest", 
      Ytest}, {Style[ToString[cx[[1]] + cx[[2]]*"x"], Purple]}, 
     ListLinePlot[{YPtrain, Ytrain}, 
      PlotLegends -> {"YPTrain", "Ytrain"}, ImageSize -> Medium], 
     ListLinePlot[{YPtest, Ytest}, PlotLegends -> {"YPtest", "Ytest"},
       ImageSize -> Medium]}, n], First]]

With n=20:

corrN[12, 4, 20]

im2

And below, a way to maximize "CCtrain" and "CCtest" at the same time:

corrAll[t1_, t2_, n_] := 
 Module[{ff, train, lm, test, cx, gg, YPtrain, Ytrain, CCtrain, 
   YPtest, Ytest, CCtest, x, vc}, 
  vc = Table[train = SortBy[RandomSample[data, t1], First]; 
    lm = LinearModelFit[train, x, x]; 
    cx = CoefficientList[Normal@lm, x]; 
    gg[x_, y_] := (cx[[1]] + cx[[2]]*x); 
    YPtrain = Map[gg[#[[1]], #[[2]]] &, train]; 
    Ytrain = train[[All, 2]]; CCtrain = Correlation[YPtrain, Ytrain]; 
    test = SortBy[RandomSample[train, t2], First]; 
    YPtest = Map[gg[#[[1]], #[[2]]] &, test]; Ytest = test[[All, 2]]; 
    CCtest = Correlation[YPtest, Ytest]; 
    ff = {CCtest, 
      CCtrain, {"YPtrain", YPtrain}, {"Ytrain", Ytrain}, {"YPtest", 
       YPtest}, {"YTest", 
       Ytest}, {Style[ToString[cx[[1]] + cx[[2]]*"x"], Purple]}, 
      ListLinePlot[{YPtrain, Ytrain}, 
       PlotLegends -> {"YPTrain", "Ytrain"}, ImageSize -> Medium], 
      ListLinePlot[{YPtest, Ytest}, 
       PlotLegends -> {"YPtest", "Ytest"}, ImageSize -> Medium]}; {ff,
      ff[[1]]*ff[[2]]}, n]; 
  vc[[Position[vc, MaximalBy[vc, Last][[1]]][[1, 1]], 1]]]

With n=100:

corrAll[12, 4, 100]

im3

..

POSTED BY: Claudio Chaib
Posted 5 years ago

That is a great solution Chaib, Congratulation!

For example, if we assume n=20, the code can recognize the best training and testing list?

POSTED BY: Alex Teymouri
POSTED BY: Claudio Chaib
POSTED BY: M.A. Ghorbani

Ok, in this case, just do something like this before the maximization (example):

y1 = {{CCtest, CCtrain, "..."}, {CCtest, CCtrain, "..."}, {CCtest, 
    CCtrain, "..."}, {CCtest, CCtrain, "..."}, {CCtest, CCtrain, 
    "..."}};
y2 = Table[
  If[y1[[x, 1]] < y1[[x, 2]], y1[[x]], Nothing], {x, 1, Length@y1}]

MaximalBy[y2, First]

Yes...you can contact me on LinkedIn (my profile).

POSTED BY: Claudio Chaib

Sorry for the late response. I deeply appreciate your efforts and also Alex's suggestion!

Is it possible to avoid overfitting in this model? We should consider only the training and testing list with CCtrain>CCtest. Am I right? May I have your email address?

POSTED BY: M.A. Ghorbani
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard