Group Abstract

Message Boards

WOLFRAM COMMUNITY

10.6K Views

17 Replies

1 Total Like

View groups...

Follow this post

Share this post:

GROUPS:

Wolfram Language Modeling Statistics and Probability

Adding weights in linear regression data

Iuval Clejan

Posted 5 years ago

I would like to use LinearModelFit with the Weights->{....} option, to account for variances (errors) in data points. However, it seems like this is only possible for dependent variable. I also have errors in independent variables. Is this something I can do in Mathematica version 8? If not, is it available in later versions (and how to do it there?)?

POSTED BY: Iuval Clejan

17 Replies

Sort By:

Claude Mante

Claude Mante, Retired

Posted 4 years ago

You wrote "I normally use the means of the bins as predictor (independent) variables, but I want to use the variances of the bins too". Maybe it would be interesting for you to use quantile regression to obtain such a predictor? See : https://community.wolfram.com/groups/-/m/t/1395719

POSTED BY: Claude Mante

Iuval Clejan

Posted 4 years ago

I don't think my data has the appropriate structure for this, but thanks

POSTED BY: Iuval Clejan

Claude Mante

Claude Mante, Retired

Posted 4 years ago

Hi! Notice that "TotalVariation" and "Tikhonv" are options for Fit. But I wonder if there is a way to determine the regularization parameter?

POSTED BY: Claude Mante

Jim Baldwin

Posted 4 years ago

You have to supply the regularization parameter. There is no automatic selection process. (But I certainly could be wrong about that.) And the predicted responses depend on how one scales the predictors (subtract the mean or subtract mean and divide by standard deviation, etc.) given the same regularization parameter. It's great that `FitRegularization` has been added but it currently seems both under-documented and lacking necessary regularization parameter selection features that are found in other more complete functions such as those in packages found in R.

POSTED BY: Jim Baldwin

Iuval Clejan

Posted 4 years ago

I couldn't find these options with an online search here for LinearModelFit

POSTED BY: Iuval Clejan

Raspi Rascal

Raspi Rascal, novato, contributor, pseudo-wannabe (not even tryhard)

Posted 4 years ago

https://reference.wolfram.com/language/ref/ParameterMixtureDistribution.html

POSTED BY: Raspi Rascal

Iuval Clejan

Posted 4 years ago

Hmm. Not sure how this is relevant to a linear regression?

POSTED BY: Iuval Clejan

Jim Baldwin

Posted 4 years ago

Before looking for some code to do this, please study the following: https://en.wikipedia.org/wiki/Errors-in-variables_models.

POSTED BY: Jim Baldwin

Iuval Clejan

Posted 4 years ago

Also, even though I could write code to find out the estimate on the regression coefficient from the formulas given, I also want all the other statistical "goodies" such as p values, confidence intervals, etc. Do I need to write code for all of these myself?

POSTED BY: Iuval Clejan

Jim Baldwin

Posted 4 years ago

Given the situation you describe (sizeable(?) errors in the predictors) from that link what should stand out is In the case when some regressors have been measured with errors, estimation based on the standard assumption leads to inconsistent estimates, meaning that the parameter estimates do not tend to the true values even in very large samples. I understand that you don't want to become an expert in Statistics. But because of the above severe warning, you should consult a Statistician rather than grow your own code. Of course, it also depends on the size of the errors in the predictor variables. If very small, then there is no issue. But your description does not include such information.

POSTED BY: Jim Baldwin

Iuval Clejan

Posted 4 years ago

POSTED BY: Iuval Clejan

Jim Baldwin

Posted 4 years ago

Admittedly "error" is a loaded word. The standard requirement for regression is that you know the predictor variable exactly. Even though your predictors might be measured without error, the fact that your data set has "binned" the predictors essentially constitutes something very akin to "error": the predictor variable is not known exactly. It is not clear how you expect to use the variability within a bin. Is that a new predictor? How specifically is that variability to be used? To know if Mathematica or R or SAS will do what you want it to do, a specific model describing the function of the predictors and how the predictors affect the error structure is required. Then someone could tell you whether a particular piece of software can estimate the coefficients.

POSTED BY: Jim Baldwin

Iuval Clejan

Posted 4 years ago

No, the variance in each bin is not a new predictor, but would combine with the variance in the dependent variables to give a weight in the chisquare sum (in the way which I indicated above, maybe there should be a square root). This seems intuitively true, but not sure how rigorous it is as far as obtaining true parameters from the limit of infinite data sets. Is there a Mathematica function or option to LinearModelFit that can do this?

POSTED BY: Iuval Clejan

Jim Baldwin

Posted 4 years ago

If they combine, you need to specify the recipe. It just doesn't magically happen. Again, what exact model are you trying to fit? That model would be specific about the weights.

POSTED BY: Jim Baldwin

Raspi Rascal

Raspi Rascal, novato, contributor, pseudo-wannabe (not even tryhard)

Posted 4 years ago

just some comment in defense of the mathematicas: stepping stones in augmenting power of statistics functions were V8 and V10. statistics functionality was heavily revised there, so you're missing out on the augmentation of V10. by now there are even more statistics functions, see the Repository launched in summer 2019. it may or may not be true that $R$ or $Maple$ have even more of those specialized statistics functions but there is some chance that none has the exact special function that you're looking for. If you do find the exact function there which you are looking for, then please share a link and we will understand exactly what you're looking for. You're request is too vague, i doht understand it completely without an example, sorry. $\text{Wolfram L}$ provides you all the functions and vocab items you need to create your own specialized statistics functions (utility functions), tailored to your needs. So if you need to compose your own function (because none of the competition has it), then you're in the right programming environment. And if statistics is all you do, day in day out, then a specialized comprehensive GUI-based statistics application would be the better choice. Keep up posted how your quest is going!

POSTED BY: Raspi Rascal

Iuval Clejan

Posted 4 years ago

OK, thanks. I read it cursorily, because I don't have the time to become an expert in statistics, and hope to use Mathematica for this purpose. It looks like what I need is total least squares. Is there a Mathematica function which can do Total Least Squares analysis? Ideally I could just use ANOVA or LinearModelFit and specify all the errors, not just for dependent variable. Is there a way to do this? There doesn't seem to be as far as I can tell. Why not?

POSTED BY: Iuval Clejan

Iuval Clejan

Posted 4 years ago

I'm going to try to revive this post. If there are known variances in dependent variable, their inverse can be used as a weight in computing the chi square sum in the regression analysis. But what if we also have information about the independent variables' variances? Can we use that information somehow? Naively, I would say (for one independent variable xi and one dependent variable yi) to find the regression coefficient r that minimizes chisquare=Sumi[(r xi -yi)^2/(r^2 sigmaxi^2+sigmay_i^2)]. Is this correct? And if so, how can the Mathematica stat package implement it (I suppose I could write my own code to do it, if not available in stat package)

POSTED BY: Iuval Clejan

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback