# Obtain t-statistics in linear regression directly

Posted 8 years ago
5595 Views
|
5 Replies
|
1 Total Likes
|
 Hi! I have the following linear regression: data = Table[{3 + i + RandomReal[{-3, 7}], i + RandomReal[{-2, 5}], i + RandomReal[{-3, 10}]}, {i, 1, 20}]; model = LinearModelFit[data, {x1, x2}, {x1, x2}] model["ParameterTable"] model["ParameterTStatistics"] I can obtain the t-statistics for the regression this way easily. What I want to know is whether this can be done using just some matrix operations on the "data" matrix. I understand the regression coefficients (estimates) may be obtained directly: estimates in matrix form. However is there such a formula for the t-statistics? The reason is that I am running a large regression (thousand of variables) which iteratively removes one variable at a time. This is slow if using the usual LinearModelFit function. Thanks.
5 Replies
Sort By:
Posted 8 years ago
 Thanks Jim!That's exactly what I needed. Yes I have both many rows and columns, so speed is of the essence. I have also noted your comments on stability, which will be important to keep track of, when using the brute-force method.Cheers!
Posted 8 years ago
 I forgot to add a cautionary note: brute-force methods (i.e., applying formulas from a textbook) are not always the most numerically stable beasts. So you should do the usual things (standardize predictor variables) for numeric stability and occasionally check the results with LinearModelFit (which I'm certain has a whole lot of quality assurance/numerical stability built-in).
Posted 8 years ago
 Here is a brute-force matrix method to obtain the t-statistics. I had to rearrange the data generated in your example to get the desired vectors and matrices and so that the order of the t-statistics matches that of LinearModelFit: In:= (* Generate data for LinearModelFit *) data = Table[{3 + i + RandomReal[{-3, 7}], i + RandomReal[{-2, 5}], i + RandomReal[{-3, 10}]}, {i, 1, 20000}]; (* Modify structure of that data for direct matrix approach *) n = Dimensions[data][] (* Number of observations *); p = Dimensions[data][] (* Number of parameters *); x = data; (* Construct design matrix *); x[[All, 3]] = 1; x = x[[All, {3, 1, 2}]]; (* Rearrange predictor variables to match output of LinearModelFit *) y = data[[All, 3]]; (* Get dependent variable *) (* Now time the two methods *) Timing[{model = LinearModelFit[data, {x1, x2}, {x1, x2}]; model["ParameterTStatistics"]}] Timing[{xpxInv = Inverse[Transpose[x].x]; (* x'x inverse *) b = xpxInv.Transpose[x].y ; (* Estimates of parameters *) mse = Total[(y - x.b)^2]/(n - p); (* Mean square error *) t = b/(mse Diagonal[xpxInv])^0.5 (* t-statistics *)}] Out= {0.327602, {{13.9661, 40.6508, 80.0327}}} Out= {0.031200, {{13.9661, 40.6508, 80.0327}}} The matrix approach takes about 1/10 of the time as LinearModelFit (if all you want are the t-statistics).If you are repeating this by removing one variable at a time, then there are things you can do to speed that up (such as calculating x'x just once for all variables and then just removing the desired row and column).And do you really have thousands of variables or thousands of observations? (Or maybe both?)
Posted 8 years ago
 Sure, there are different approaches for model selection.However my main question is, whether we can back out the t-statistics using some matrix operations on the "data" variable? That is my main point of interest.Thanks!
Posted 8 years ago
 Selecting a "best" model is not usually recommended to be done on t-statistics for variables one at a time. One approach is to perform model selection using the AIC statistic found with model["AIC"] But does your regression have thousands of observations or thousands of variables?