Group Abstract Group Abstract

Message Boards Message Boards

0
|
765 Views
|
2 Replies
|
2 Total Likes
View groups...
Share
Share this post:

Monte Carlo simulation of linear regression for Binormal distributions of various correlations

Posted 1 month ago

Check my logic please,

In engineering applications, it is common to encounter analyses where engineers draw a small sample (e.g., n ≈ 10) from a process, perform a linear regression, and cite the coefficient of determination (R²) as evidence of a meaningful relationship.

To demonstrate the limitations of this approach I developed a Monte Carlo simulation generating binormal data across a range of correlations (ρ), repeatedly sampling (n = 10, 30, 50 & 200), computing the sample R² for each iteration, and aggregating the results to estimate the probability density function (PDF) and cumulative distribution function (CDF) of R² for each ρ value and sample size.

corrValues = {0, 0.25, 0.5, 0.75, 0.8, 0.85, 0.95, 0.99, 
   0.995}; (* Correlation Values to use in the Monte Carlo runs *)
sampleSizes = {10, 30, 50, 
   200};  (* Sample sizes to use in the Monte Carlo runs *)
runs = 10000; (* Number of Monte Carlo runs *)

(* Generate data for each correlation graph *)
datatest1 = BlockRandom[SeedRandom[123];
   Table[
    RandomVariate[BinormalDistribution[{0, 0}, {1, 1}, i1], 
     5000], {i1, corrValues}]];

(* Create grid layout for each correlation value *)
Grid[
 Table[
  Prepend[
   ParallelTable[
    Column[{
      Show[
Histogram[
Table[
With[{sample = RandomVariate[
BinormalDistribution[{0, 0}, {1, 1}, 
Part[corrValues, j]], samp]}, 
          model = LinearModelFit[sample, x, x]; rSquared = Round[
            model["RSquared"] 100]], runs], 101, "PDF", 
        PlotRange -> {{0, 100}, All}, ImageSize -> 250, 
        AspectRatio -> 1, AxesStyle -> LightGray, TicksStyle -> Black,
         Ticks -> {
Range[0, 100, 10], Automatic}, 
        GridLines -> {{Part[corrValues, j]^2 100}, None}, 
        GridLinesStyle -> Directive[Magenta, Dashed], AxesLabel -> {
Style["R²", Black, Bold, 12], 
Style["%", Black, Bold]}, LabelStyle -> Black, 
        ChartLayout -> {"Column", 4}, ColorFunction -> "BrightBands", 
        PlotLabel -> Column[{
Style["Sample Size = " <> ToString[
NumberForm[samp, {2, 0}]], 12, Bold, DarkBlue], 
Style["          PDF", 12, Bold, DarkBlue]}]]],
      Show[
       Histogram[
        Table[
         With[
          {sample = 
            RandomVariate[
             BinormalDistribution[{0, 0}, {1, 1}, corrValues[[j]]], 
             samp]},
          model = LinearModelFit[sample, x, x];
          rSquared = Round[model["RSquared"]*100]
          ],
         runs
         ],
        101,(* Number of bars in the histogram *)
        "CDF",
        Sequence[
        PlotRange -> {{0, 100}, {0, 1.1}}, ImageSize -> 250, 
         AspectRatio -> 1, AxesStyle -> LightGray, 
         TicksStyle -> Black, Ticks -> {
Range[0, 100, 10], 
Range[0, 1, 0.1]}, GridLines -> {
Range[0, 100, 10], {0.5}}, AxesLabel -> {
Style["R²", Black, Bold, 12], None}, LabelStyle -> Black, 
         ChartLayout -> {"Column", 4}, ColorFunction -> "BrightBands",
          PlotLabel -> Style["CDF", 12, Bold, DarkBlue], Epilog -> {
Text[
Style["2.5%", 10, Bold, Blue], {10, 0.06}], 
Directive[StandardBlue, 
Opacity[0.5]], 
Polygon[{{0, 0}, {0, 0.025}, {100, 0.025}, {100, 0}}], 
Text[
Style["97.5%", 10, Bold, Blue], {10, 0.94}], 
Directive[StandardBlue, 
Opacity[0.5]], 
Polygon[{{0, 0.975}, {0, 1}, {100, 1}, {100, 0.975}}], Magenta, 
           Dashed, 
Thickness[0.007], 
Line[{{Part[corrValues, j]^2 100, 0}, {Part[corrValues, j]^2 100, 
              1.05}}], 
Text[
Style["R²= " <> ToString[
PercentForm[Part[corrValues, j]^2, {2, 0}]], 10, Bold, 
             Magenta], {Part[corrValues, j]^2 100, 1.1}]}]
        ]
       ]
      }]
    , {samp, sampleSizes}
    ],

   ListPlot[
    datatest1[[j]],
    Sequence[PlotStyle -> Directive[
PointSize[Small], Red], PlotInteractivity -> False, ImageSize -> 330, 
     AspectRatio -> 1, FrameTicks -> None, Frame -> False, 
     Axes -> None, PlotLabel -> Column[{
Style[ToString[
PercentForm[
Part[corrValues, j], {3, 1}]] <> " Correlation", 14, Bold, DarkBlue], 

Style["       R²= " <> ToString[
PercentForm[Part[corrValues, j]^2, {2, 0}]], 12, DarkBlue]}]]
    ]
   ],
  {j, Length[datatest1]}
  ], Dividers -> {{{True}}, {{Thickness[4]}}}
 ]

enter image description here

enter image description here

enter image description here

enter image description here

Attachments:
2 Replies
Posted 23 days ago

When $\rho=0$, the square of the sample correlation coefficient has a Beta distribution with parameters 1/2 and $(n-2)/2$ where $n$ is the sample size. When $\rho \neq 0$, the distribution is much more complicated.

While I believe you when you say that (some and hopefully not all) engineers take a random sample from a bivariate distribution and perform a regression, that is not appropriate data for a standard linear regression such as LinearModelFit. What you have is a case of "errors-in-variables" which requires special techniques to appropriately analyze. See https://en.wikipedia.org/wiki/Errors-in-variables_model.

Also it is not true that $R^2$ by itself should be used "as evidence of a meaningful relationship." For a particular application an $R^2$ value of 0.95 might not be good enough to meet one's objectives. You might consider checking those things out with a statistician or ask at https://stats.stackexchange.com.

POSTED BY: Jim Baldwin

I'm looking at error in variables now. I understand what i posted is not the "appropriate" way to use linear regressions. I'm just trying to generate a chart that shows why. I am an automation engineer in the biomed field, and you would not believe the number of times i have to argue with someone over their linear regressions of n=10. So i need to better develop my argument.

What I was looking for is someone to check my math/logic, and what methods are better. You did both, Thank you,

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard