Message Boards Message Boards

5 Replies
1 Total Likes
View groups...
Share this post:

Obtain t-statistics in linear regression directly

Posted 9 years ago

Hi! I have the following linear regression:

data = Table[{3 + i + RandomReal[{-3, 7}], i + RandomReal[{-2, 5}], 
    i + RandomReal[{-3, 10}]}, {i, 1, 20}];
model = LinearModelFit[data, {x1, x2}, {x1, x2}]

I can obtain the t-statistics for the regression this way easily. What I want to know is whether this can be done using just some matrix operations on the "data" matrix. I understand the regression coefficients (estimates) may be obtained directly: estimates in matrix form. However is there such a formula for the t-statistics? The reason is that I am running a large regression (thousand of variables) which iteratively removes one variable at a time. This is slow if using the usual LinearModelFit function. Thanks.

POSTED BY: Priyan Fernando
5 Replies

Thanks Jim!

That's exactly what I needed. Yes I have both many rows and columns, so speed is of the essence. I have also noted your comments on stability, which will be important to keep track of, when using the brute-force method.


POSTED BY: Priyan Fernando
Posted 9 years ago

I forgot to add a cautionary note: brute-force methods (i.e., applying formulas from a textbook) are not always the most numerically stable beasts. So you should do the usual things (standardize predictor variables) for numeric stability and occasionally check the results with LinearModelFit (which I'm certain has a whole lot of quality assurance/numerical stability built-in).

POSTED BY: Jim Baldwin
Posted 9 years ago

Here is a brute-force matrix method to obtain the t-statistics. I had to rearrange the data generated in your example to get the desired vectors and matrices and so that the order of the t-statistics matches that of LinearModelFit:

In[213]:= (* Generate data for LinearModelFit *)
data = Table[{3 + i + RandomReal[{-3, 7}], i + RandomReal[{-2, 5}], 
    i + RandomReal[{-3, 10}]}, {i, 1, 20000}];

(* Modify structure of that data for direct matrix approach *)
n = Dimensions[data][[1]]  (* Number of observations *);
p = Dimensions[data][[2]]  (* Number of parameters *);
x = data;  (* Construct design matrix *);
x[[All, 3]] = 1; 
x = x[[All, {3, 1, 2}]]; (* Rearrange predictor variables to match output of LinearModelFit *)
y = data[[All, 3]];  (* Get dependent variable *)

(* Now time the two methods *)
Timing[{model = LinearModelFit[data, {x1, x2}, {x1, x2}]; 

Timing[{xpxInv = Inverse[Transpose[x].x];   (* x'x inverse *)
  b = xpxInv.Transpose[x].y ;  (* Estimates of parameters *)
  mse = Total[(y - x.b)^2]/(n - p);  (* Mean square error *)
  t = b/(mse Diagonal[xpxInv])^0.5  (* t-statistics *)}]

Out[218]= {0.327602, {{13.9661, 40.6508, 80.0327}}}

Out[219]= {0.031200, {{13.9661, 40.6508, 80.0327}}}

The matrix approach takes about 1/10 of the time as LinearModelFit (if all you want are the t-statistics).

If you are repeating this by removing one variable at a time, then there are things you can do to speed that up (such as calculating x'x just once for all variables and then just removing the desired row and column).

And do you really have thousands of variables or thousands of observations? (Or maybe both?)

POSTED BY: Jim Baldwin

Sure, there are different approaches for model selection.

However my main question is, whether we can back out the t-statistics using some matrix operations on the "data" variable? That is my main point of interest.


POSTED BY: Priyan Fernando
Posted 9 years ago

Selecting a "best" model is not usually recommended to be done on t-statistics for variables one at a time. One approach is to perform model selection using the AIC statistic found with


But does your regression have thousands of observations or thousands of variables?

POSTED BY: Jim Baldwin
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract