Group Abstract Group Abstract

Message Boards Message Boards

0
|
33 Views
|
0 Replies
|
0 Total Likes
View groups...
Share
Share this post:

Monte Carlo simulation of linear regression for Binormal distributions of various correlations

Posted 1 day ago

Check my logic please,

In engineering applications, it is common to encounter analyses where engineers draw a small sample (e.g., n ≈ 10) from a process, perform a linear regression, and cite the coefficient of determination (R²) as evidence of a meaningful relationship.

To demonstrate the limitations of this approach I developed a Monte Carlo simulation generating binormal data across a range of correlations (ρ), repeatedly sampling (n = 10, 30, 50 & 200), computing the sample R² for each iteration, and aggregating the results to estimate the probability density function (PDF) and cumulative distribution function (CDF) of R² for each ρ value and sample size.

corrValues = {0, 0.25, 0.5, 0.75, 0.8, 0.85, 0.95, 0.99, 
   0.995}; (* Correlation Values to use in the Monte Carlo runs *)
sampleSizes = {10, 30, 50, 
   200};  (* Sample sizes to use in the Monte Carlo runs *)
runs = 10000; (* Number of Monte Carlo runs *)

(* Generate data for each correlation graph *)
datatest1 = BlockRandom[SeedRandom[123];
   Table[
    RandomVariate[BinormalDistribution[{0, 0}, {1, 1}, i1], 
     5000], {i1, corrValues}]];

(* Create grid layout for each correlation value *)
Grid[
 Table[
  Prepend[
   ParallelTable[
    Column[{
      Show[
Histogram[
Table[
With[{sample = RandomVariate[
BinormalDistribution[{0, 0}, {1, 1}, 
Part[corrValues, j]], samp]}, 
          model = LinearModelFit[sample, x, x]; rSquared = Round[
            model["RSquared"] 100]], runs], 101, "PDF", 
        PlotRange -> {{0, 100}, All}, ImageSize -> 250, 
        AspectRatio -> 1, AxesStyle -> LightGray, TicksStyle -> Black,
         Ticks -> {
Range[0, 100, 10], Automatic}, 
        GridLines -> {{Part[corrValues, j]^2 100}, None}, 
        GridLinesStyle -> Directive[Magenta, Dashed], AxesLabel -> {
Style["R²", Black, Bold, 12], 
Style["%", Black, Bold]}, LabelStyle -> Black, 
        ChartLayout -> {"Column", 4}, ColorFunction -> "BrightBands", 
        PlotLabel -> Column[{
Style["Sample Size = " <> ToString[
NumberForm[samp, {2, 0}]], 12, Bold, DarkBlue], 
Style["          PDF", 12, Bold, DarkBlue]}]]],
      Show[
       Histogram[
        Table[
         With[
          {sample = 
            RandomVariate[
             BinormalDistribution[{0, 0}, {1, 1}, corrValues[[j]]], 
             samp]},
          model = LinearModelFit[sample, x, x];
          rSquared = Round[model["RSquared"]*100]
          ],
         runs
         ],
        101,(* Number of bars in the histogram *)
        "CDF",
        Sequence[
        PlotRange -> {{0, 100}, {0, 1.1}}, ImageSize -> 250, 
         AspectRatio -> 1, AxesStyle -> LightGray, 
         TicksStyle -> Black, Ticks -> {
Range[0, 100, 10], 
Range[0, 1, 0.1]}, GridLines -> {
Range[0, 100, 10], {0.5}}, AxesLabel -> {
Style["R²", Black, Bold, 12], None}, LabelStyle -> Black, 
         ChartLayout -> {"Column", 4}, ColorFunction -> "BrightBands",
          PlotLabel -> Style["CDF", 12, Bold, DarkBlue], Epilog -> {
Text[
Style["2.5%", 10, Bold, Blue], {10, 0.06}], 
Directive[StandardBlue, 
Opacity[0.5]], 
Polygon[{{0, 0}, {0, 0.025}, {100, 0.025}, {100, 0}}], 
Text[
Style["97.5%", 10, Bold, Blue], {10, 0.94}], 
Directive[StandardBlue, 
Opacity[0.5]], 
Polygon[{{0, 0.975}, {0, 1}, {100, 1}, {100, 0.975}}], Magenta, 
           Dashed, 
Thickness[0.007], 
Line[{{Part[corrValues, j]^2 100, 0}, {Part[corrValues, j]^2 100, 
              1.05}}], 
Text[
Style["R²= " <> ToString[
PercentForm[Part[corrValues, j]^2, {2, 0}]], 10, Bold, 
             Magenta], {Part[corrValues, j]^2 100, 1.1}]}]
        ]
       ]
      }]
    , {samp, sampleSizes}
    ],

   ListPlot[
    datatest1[[j]],
    Sequence[PlotStyle -> Directive[
PointSize[Small], Red], PlotInteractivity -> False, ImageSize -> 330, 
     AspectRatio -> 1, FrameTicks -> None, Frame -> False, 
     Axes -> None, PlotLabel -> Column[{
Style[ToString[
PercentForm[
Part[corrValues, j], {3, 1}]] <> " Correlation", 14, Bold, DarkBlue], 

Style["       R²= " <> ToString[
PercentForm[Part[corrValues, j]^2, {2, 0}]], 12, DarkBlue]}]]
    ]
   ],
  {j, Length[datatest1]}
  ], Dividers -> {{{True}}, {{Thickness[4]}}}
 ]

enter image description here

enter image description here

enter image description here

enter image description here

Attachments:
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard