Message Boards Message Boards


Write a ResourceFunction to improve LogitModelFit or get Wolfram to fix it?

Posted 1 year ago
14 Replies
4 Total Likes

There's a problem with LogitModelFit as implemented in Mathematica: it sometimes fails to converge. Some internet research suggests Mathematica is not the only software with this issue. I had an intuition that the iterative algorithm apparently employed (Iteratively Reweighted Least Squares a/k/a IRLS) might converge if I permuted the data. And, some playing around, confirmed this intuition. This finding has given rise to the following function, still in an early implementation stage.

LogitModelFitWithJiggling[data_, f_, x_, n_, 
  opts : OptionsPattern[LogitModelFit]] :=
     Module[{lomf = LogitModelFit[RandomSample@data, f, x, opts]}, 
        lomf, _FittedModel], (Print[
         StringTemplate["success on attempt `1`"][i]]; 
        Throw[lomf])]], {i, 1, n}]; (Print[
        "failure to fit logistic model after `1` attempts"][n]]; 

I find this works a lot of the time. So, the question is should I attempt to fix this via a ResourceFunction, should Wolfram fix it internally so that we don't need the ResourceFunction, or (I guess) both? And if it is going to be a ResourceFunction, would anyone like to collaborate so it works well or have suggestions on features and functional forms?

Also, it kind of bothers me that I don't really have any rigor to support the method other than schemes that are iterative sometimes depend on initial conditions. Anyone have an idea why this might work (and when it will fail)?

14 Replies

First, may I humbly ask @Robert Rimmer to accept my apology.

As I read Seth's problem, there are only 3 terms, not 4. The design matrix you constructed did indeed represent 4 terms, but I don't see where the 4th one comes from.

Posted 1 year ago

@Robert Nachbar In your latest response you blamed Robert Rimmer for something that I think you meant for me. (I don't see any response from Robert Rimmer in this thread.)

I don't think you understood my comment when I stated "LogitModelFit thinks there are 4 parameters when only 3 can be fit..." The error message provided says

The rank of the design matrix 3 is less than the number of terms 4 in the model.

The syntax used makes the request to fit 4 parameters.

Also, Seth Chandler makes an incorrect inference with "Moreover, there is no automatic problem using more basis functions in a regression than there are parameters to be estimated. This code, for example, has 4 basis functions and only two parameters, yet it works just fine." That has two problems: There are four parameters (not two). He is mixing up the terms "parameters" and the number of variables in the dataset. But after correcting the terminology in his polynomial example as long as the number of polynomial terms do not exceed the number of data points, that model is not overparameterized. But when using nominal variables and including an intercept term, that is not the case.

I understand the logic and efficiency for the syntax for LogitModelFit, LinearModelFit, etc. to keep folks from trying to specify nonlinear models (as one can do with NonlinearModelFit) when these functions are designed for just linear models. However, such confusion with the number of parameters with nominal variables can arise.

But, again, the most egregious issue is the suggestion that randomly permuting the data will provide a workaround.

While not formally trained as a statistician, I do understand the issues of parameter identifiability and nonlinear optimization. I would also like to learn from @Seth Chandler's example, particularly how to diagnose the cause of the failure.

Using the extra precision approach of @Shenghui Yang, I can get a fitted model object. The design matrix has only three columns, so I'm curious why @Robert Rimmer said there are four parameters. I know that each nominal variable i with ki levels introduces ki - 1 parameters, but that is not the case here.

In[18]:= fit = LogitModelFit[SetPrecision[data, 30], {1, x, a}, {x, a}, 
  NominalVariables -> {a}]

enter image description here

The largest absolute value of the off-diagonal elements of the correlation matrix is 0.626:

In[30]:= fit["CorrelationMatrix"]
Max@Abs@LowerTriangularize[%, -1]

Out[30]= {{1.0000000000000000000000000000, 
  0.6263318050947711872702412943, -0.1063345472486892413459999850}, \
{0.6263318050947711872702412943, 1.0000000000000000000000000000, 
  0.5110815111665113340197262073}, {-0.1063345472486892413459999850, 
  0.5110815111665113340197262073, 1.0000000000000000000000000000}}

Out[31]= 0.6263318050947711872702412943

Moreover, the design matrix is of full rank:

In[32]:= X = fit["DesignMatrix"];

Out[33]= {20, 3}

In[34]:= SingularValueList[X]

Out[34]= {5.01758683340411161954703549800, \
3.51108284450300297772988275222, 1.91399760190032649685817260276}

and not ill-conditioned. All the nominal variable recoding methods with which I'm familiar do not produce an n by 4 design matrix for this problem, as suggested above, but n by 3 (cf. Coding Systems for Categorical Variables in Regression). In fact, that design matrix has an exact colinarity as the last column is a linear combination of the preceding columns!

I do agree that Fit, while quick and dirty, never gives me the information I generally want about the quality of a fit, and I generally use LinearModelFit or NonlinearModelFit.

Permuting the order of the data rows should not change the result for these deterministic statistical methods, and the fact that it does suggests a numerical instability due to machine precision of the numbers. The final message irlsfail is curious because the documentation for LogitModelFit does not mention iterative reweighting. The subject is mentioned in the tutorial Numerical Operations on Data, but now I'm getting out of my depth and would prefer someone form the stats team at WRI to shed some light on the issue.

Regarding the use of 1 as a basis function or the use of the IncludeConstantBasis -> True option, they are effectively synonymous. The list of basis functions takes precedence, however, and using False with 1 in the list of basis functions retains the intercept in the model.

I didn't work in a fresh Kernel and had a residual assignment to x! My bad.

Posted 1 year ago

Your nominal variable is a bit pathological in that just having the values 0 and 1 you get the same fit whether those variables are labeled nominal or not.

In this case the issue was because behind the scenes LogitModelFit was attempting to fit more parameters than what you thought you had.

And because you did "look forward to any constructive assistance" I offer the following: (1) Don't use Fit as it does not get you measures of precision as NonlinearModelFit does, and (2) Lack of convergence for any piece of software (Mathematica, SAS, SPSS, R, etc.) happens relatively frequently but usually because of "user error" (such as bad starting values, model overparameterization, poor model formulation, highly correlated predictor variables, and not enough of the right kind of data) rather than the software messing up. If one is not a statistician and such things happen, please consult with a statistician.

Here is a notebook with further information on the peculiar behavior (as well as what may be a simple workaround).

Maybe to help clarify the issue about the overparameterization of the model you tried. Consider fitting that same model but using the design matrix/response format:

design = Transpose[{ConstantArray[1, Length[unpleasant]], unpleasant[[All, 1]], 
  unpleasant[[All, 2]], 1 - unpleasant[[All, 2]]}];
response = unpleasant[[All, 3]];
LogitModelFit[{design, response}]

Less than full rank message

It would be good of the other formulation would print out the "less than full rank" message.

Same thing as my previous answer. You are attempting to fit an overparameterized model. You've included the intercept and two values for the nominal variable when there are only 2 of those parameters that are estimable. This issue should be covered in a regression class. However, it is true that this might not seem obvious from the documentation for LogitModelFit.

Please don't promote things that make no sense such as randomly permuting the data when the order of the data should not make any difference. If the ordering does make a difference in LogitModelFit, then that should be reported to Wolfram as a bug. Randomly ordering should not be considered a "workaround".

For the unpleasant dataset AND your model: you've attempted to fit an overparameterized model. LogitModelFit thinks there are 4 parameters when only 3 can be fit when use

LogitModelFit[inconvenient, {1, x, a}, {x, a}, NominalVariables -> {a}]

You should be using something like one of the following:

LogitModelFit[inconvenient, {x, a}, {x, a}, IncludeConstantBasis -> False, NominalVariables -> a]
LogitModelFit[inconvenient, {x, a}, {x, a}]

Here's a more complete notebook; I didn't think it was important where the data came from. You will see the same phenomenon, however. The problem originates from a large research project involving "unfairness" by classifiers and different definitions of equality.

Raising precision for the numbers, but not sure if the option should be used automatically


I cannot reproduce your results with M12.1 on my Mac, running Mojave. I always get $Failed.

I wanted to look at the correlation matrix of the parameters.

Here's a notebook exploring the issue further.

Posted 1 year ago

The issue of "lack of convergence" occurs with all iterative procedures and with all software packages (including SAS and R). However, many times it occurs when there is an overparameterized model, highly correlated predictor variables, and/or models (data) that just don't agree with the data (models).

If you could give a specific example or two, some specific advice could be given.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract