Message Boards Message Boards

Write a ResourceFunction to improve LogitModelFit or get Wolfram to fix it?

There's a problem with LogitModelFit as implemented in Mathematica: it sometimes fails to converge. Some internet research suggests Mathematica is not the only software with this issue. I had an intuition that the iterative algorithm apparently employed (Iteratively Reweighted Least Squares a/k/a IRLS) might converge if I permuted the data. And, some playing around, confirmed this intuition. This finding has given rise to the following function, still in an early implementation stage.

LogitModelFitWithJiggling[data_, f_, x_, n_, 
  opts : OptionsPattern[LogitModelFit]] :=
      Quiet@Catch[(Table[
     Module[{lomf = LogitModelFit[RandomSample@data, f, x, opts]}, 
      If[MatchQ[
        lomf, _FittedModel], (Print[
         StringTemplate["success on attempt `1`"][i]]; 
        Throw[lomf])]], {i, 1, n}]; (Print[
      StringTemplate[
        "failure to fit logistic model after `1` attempts"][n]]; 
     Throw[$Failed]))]

I find this works a lot of the time. So, the question is should I attempt to fix this via a ResourceFunction, should Wolfram fix it internally so that we don't need the ResourceFunction, or (I guess) both? And if it is going to be a ResourceFunction, would anyone like to collaborate so it works well or have suggestions on features and functional forms?

Also, it kind of bothers me that I don't really have any rigor to support the method other than schemes that are iterative sometimes depend on initial conditions. Anyone have an idea why this might work (and when it will fail)?

POSTED BY: Seth Chandler
16 Replies
Posted 3 years ago

One more post: One needs to understand what parameterization is being used with LogitModelFit and that depends on how the Mathematica code is written. (And there does appear to be some Mathematica programming issues for which I'll file something with Wolfram Support.)

Using the unpleasant data consider 3 sets of code that give 3 different parameterizations but produce identical predictions:

unpleasant = {{-0.18765344042046503, 1, 1}, {-0.7868417124501981, 0,  0}, {0.5159877297060876, 1, 0},
{0.5192552864411443, 1, 1}, {-0.344034593110404, 1, 1}, {0.4271541695794353, 1, 1},
{-0.8105323274640571, 0, 1}, {-0.2615840458685881, 1, 1}, {0.17145086850545183, 1, 1},
{-2.212384312432726, 0, 0}, {0.5688136764148906, 1, 1}, {0.32765406179511536, 1, 1},
{1.2853090964313258, 1, 1}, {-0.4141158288462166, 1, 0}, {-0.1998832888320074, 1, 1},
 {0.22233926060893827, 1, 1}, {-1.3950479727543563, 1, 0}, {-0.19610182024695302, 0, 0},
 {0.7957424118817192, 1, 1}, {-0.7477080037535241, 1, 0}, {-0.2398443950865801, 1, 1}, 
{-0.014513991786248733, 1, 1}, {0.7403747849770742, 1, 1}, {0.0009430098138824265, 0, 1}, 
{0.8067655004254857, 1, 1}, {1.5727656192536221, 1, 1}, {1.6949798505254474, 1, 1}, 
{1.1406137792154913, 1, 1}, {-0.5779852039553085, 1, 1}, {-1.2722561308326152, 0, 1}};

model1 = LogitModelFit[unpleasant, {x, a}, {x, a}, IncludeConstantBasis -> False, NominalVariables -> a]["ParameterTable"]
model2 = LogitModelFit[unpleasant, {x, a}, {x, a}]["ParameterTable"]
model3 = LogitModelFit[unpleasant, {a, x}, {x, a}, NominalVariables -> a]["ParameterTable"]

Model parameter estimates

One sees that the coefficients for x are all identical. (And all predictions are identical.) But the parameterization of the model differs. The parameterization simply depends on what is appropriate for display for your objective.

The first model has two separate intercepts: a[0] and a[1]. The second model has an intercept with label 1 and is identical to the value for a[0] in the first model. The estimate for a in the second model is just the deviation from the intercept for when a=1. The third model has the intercept being the effect of a=1 and a[0] is the deviation from the intercept when a=0.

Now what I think is some sort of error in LogitModelFit is that if the variables are reversed in the third model, an error occurs and reversing the order of the variables should not affect the results.

model4 = LogitModelFit[unpleasant, {x, a}, {x, a}, NominalVariables -> a]["ParameterTable"];

Error messages

So it appears that the order that the predictors are entered into the code (which is not the order in the dataset) caused problems. There are other datasets where the order doesn't seem to matter which makes it difficult to pin down the real issue.

In short, one should probably construct your own nominal/dummy variables and use the design matrix and response format for LogitModelFit (and also maybe for LinearModelFit) when nominal variables are needed.

POSTED BY: Jim Baldwin
Posted 5 years ago

@Robert Nachbar In your latest response you blamed Robert Rimmer for something that I think you meant for me. (I don't see any response from Robert Rimmer in this thread.)

I don't think you understood my comment when I stated "LogitModelFit thinks there are 4 parameters when only 3 can be fit..." The error message provided says

The rank of the design matrix 3 is less than the number of terms 4 in the model.

The syntax used makes the request to fit 4 parameters.

Also, Seth Chandler makes an incorrect inference with "Moreover, there is no automatic problem using more basis functions in a regression than there are parameters to be estimated. This code, for example, has 4 basis functions and only two parameters, yet it works just fine." That has two problems: There are four parameters (not two). He is mixing up the terms "parameters" and the number of variables in the dataset. But after correcting the terminology in his polynomial example as long as the number of polynomial terms do not exceed the number of data points, that model is not overparameterized. But when using nominal variables and including an intercept term, that is not the case.

I understand the logic and efficiency for the syntax for LogitModelFit, LinearModelFit, etc. to keep folks from trying to specify nonlinear models (as one can do with NonlinearModelFit) when these functions are designed for just linear models. However, such confusion with the number of parameters with nominal variables can arise.

But, again, the most egregious issue is the suggestion that randomly permuting the data will provide a workaround.

POSTED BY: Jim Baldwin

First, may I humbly ask @Robert Rimmer to accept my apology.

As I read Seth's problem, there are only 3 terms, not 4. The design matrix you constructed did indeed represent 4 terms, but I don't see where the 4th one comes from.

POSTED BY: Robert Nachbar
Posted 3 years ago

It’s been a year since my last response and unfortunately, I need to repeat: the whole problem is that the LogitModelFit code requests the fit of a model that is overparameterized. This is not an opinion in that I think the model is too complex and needs simplification to make it more easily understood. Stated another way: The given code results in a singular design matrix and the blame resides in the request to attempt to fit an overparameterized model and not with Mathematica. (Although Mathematica could certainly put out a more direct error message.)

The model being requested to fit with LogitModelFit[data, {1, x, a}, {x, a}, NominalVariables -> {a}] is

$$\log\left({{p_i}\over{1-p_i}}\right)=\mu+a_i \alpha_1+(1-a_i)\alpha_0 +\beta x_i$$

where $p_i$ is the probability a success, $\mu$ is the intercept (requested by the 1 in {1, x, a}), $\alpha_0$ and $\alpha_1$ are the effects of the nominal variable a (requested by the a in {1, x, a} and NominalVariables -> {a}), and $\beta$ is the coefficient multiplied by $x_i$ (requested by the x in {1, x, a}).

This model is overparameterized and LogitModelFit doesn’t handle that very well.

Why is it overparameterized? Let’s rewrite the equation as

$$\log\left({{p_i}\over{1-p_i}}\right)=\mu+\alpha_0+a_i(\alpha_1-\alpha_0)\beta x_i$$

We see that there are really on 3 parameters $\mu+\alpha_0$, $\alpha_1-\alpha_0$, and $\beta$ that can be estimated. You might think that request LogitModelFit[data, {1, x, a}, {x, a}, NominalVariables -> {a}] should be only fitting 3 parameters but that construction tells LogitModelFit to attempt to fit 4 parameters and it chokes.

Not convinced? Consider the design matrix given the data.

(* Get the columns of the design matrix ready *)
n = Length[data];
ones = ConstantArray[1, n];  (* For the intercept mu *)
covariate = data[[All, 1]]; (* x *)
a = {#[[2]], 1 - #[[2]]} & /@ data;

(* Design matrix *)
m = Transpose[{ones, covariate, a[[All, 1]], a[[All, 2]]}];

MatrixRank[m]
(* 3 *)

The rank is 3 (and not 4). The rank of 3 indicates that only 3 unique parameters can be fit (or put another way "uniquely identified"). So it is not logical to expect nice results from an overparameterized model. As long as only 3 parameters are requested to be estimated everything is fine. And there are several ways to obtain identical model predictions.

So how to get the model that one thinks they should be getting? There are several ways. Here are 4 equivalent ways to obtain the identical predictions.

(* Equivalent models *)
m1 = Transpose[{ones, data[[All, 1]], data[[All, 2]]}];
lm1 = LogitModelFit[{m1, data[[All, 3]]}]

m2 = Transpose[{data[[All, 1]], 1 - data[[All, 2]], data[[All, 2]]}];
lm2 = LogitModelFit[{m2, data[[All, 3]]}]

lm3 = LogitModelFit[data, {x, a}, {x, a}, NominalVariables -> a, 
  IncludeConstantBasis -> False]

lm4 = LogitModelFit[data, {x, a}, {x, a}]

(* Equivalent predictions *)
lm1["PredictedResponse"]
(* {0.971958, 0.779499, 0.297112, 0.514681, 0.95155, 0.951966, 0.527294, 
     0.753359, 0.737949, 0.55053, 0.955198, 0.825052, 0.875761, 0.535885,
     0.187753, 0.998126, 0.98149, 0.696078, 0.997495, 0.911263} *)

lm2["PredictedResponse"]
(* {0.971958, 0.779499, 0.297112, 0.514681, 0.95155, 0.951966, 0.527294, 
     0.753359, 0.737949, 0.55053, 0.955198, 0.825052, 0.875761, 0.535885,
     0.187753, 0.998126, 0.98149, 0.696078, 0.997495, 0.911263} *)

lm3["PredictedResponse"]
(* {0.971958, 0.779499, 0.297112, 0.514681, 0.95155, 0.951966, 0.527294, 
     0.753359, 0.737949, 0.55053, 0.955198, 0.825052, 0.875761, 0.535885,
     0.187753, 0.998126, 0.98149, 0.696078, 0.997495, 0.911263} *)

lm4["PredictedResponse"]
(* {0.971958, 0.779499, 0.297112, 0.514681, 0.95155, 0.951966, 0.527294, 
     0.753359, 0.737949, 0.55053, 0.955198, 0.825052, 0.875761, 0.535885,
     0.187753, 0.998126, 0.98149, 0.696078, 0.997495, 0.911263} *)

Until better error messages are added and the documentation deals with more detail about NominalVariables one should construct their own nominal/dummy variables.

POSTED BY: Jim Baldwin
Posted 5 years ago

Your nominal variable is a bit pathological in that just having the values 0 and 1 you get the same fit whether those variables are labeled nominal or not.

In this case the issue was because behind the scenes LogitModelFit was attempting to fit more parameters than what you thought you had.

And because you did "look forward to any constructive assistance" I offer the following: (1) Don't use Fit as it does not get you measures of precision as NonlinearModelFit does, and (2) Lack of convergence for any piece of software (Mathematica, SAS, SPSS, R, etc.) happens relatively frequently but usually because of "user error" (such as bad starting values, model overparameterization, poor model formulation, highly correlated predictor variables, and not enough of the right kind of data) rather than the software messing up. If one is not a statistician and such things happen, please consult with a statistician.

POSTED BY: Jim Baldwin

While not formally trained as a statistician, I do understand the issues of parameter identifiability and nonlinear optimization. I would also like to learn from @Seth Chandler's example, particularly how to diagnose the cause of the failure.

Using the extra precision approach of @Shenghui Yang, I can get a fitted model object. The design matrix has only three columns, so I'm curious why @Robert Rimmer said there are four parameters. I know that each nominal variable i with ki levels introduces ki - 1 parameters, but that is not the case here.

In[18]:= fit = LogitModelFit[SetPrecision[data, 30], {1, x, a}, {x, a}, 
  NominalVariables -> {a}]

enter image description here

The largest absolute value of the off-diagonal elements of the correlation matrix is 0.626:

In[30]:= fit["CorrelationMatrix"]
Max@Abs@LowerTriangularize[%, -1]

Out[30]= {{1.0000000000000000000000000000, 
  0.6263318050947711872702412943, -0.1063345472486892413459999850}, \
{0.6263318050947711872702412943, 1.0000000000000000000000000000, 
  0.5110815111665113340197262073}, {-0.1063345472486892413459999850, 
  0.5110815111665113340197262073, 1.0000000000000000000000000000}}

Out[31]= 0.6263318050947711872702412943

Moreover, the design matrix is of full rank:

In[32]:= X = fit["DesignMatrix"];
Dimensions@X

Out[33]= {20, 3}

In[34]:= SingularValueList[X]

Out[34]= {5.01758683340411161954703549800, \
3.51108284450300297772988275222, 1.91399760190032649685817260276}

and not ill-conditioned. All the nominal variable recoding methods with which I'm familiar do not produce an n by 4 design matrix for this problem, as suggested above, but n by 3 (cf. Coding Systems for Categorical Variables in Regression). In fact, that design matrix has an exact colinarity as the last column is a linear combination of the preceding columns!

I do agree that Fit, while quick and dirty, never gives me the information I generally want about the quality of a fit, and I generally use LinearModelFit or NonlinearModelFit.

Permuting the order of the data rows should not change the result for these deterministic statistical methods, and the fact that it does suggests a numerical instability due to machine precision of the numbers. The final message irlsfail is curious because the documentation for LogitModelFit does not mention iterative reweighting. The subject is mentioned in the tutorial Numerical Operations on Data, but now I'm getting out of my depth and would prefer someone form the stats team at WRI to shed some light on the issue.

Regarding the use of 1 as a basis function or the use of the IncludeConstantBasis -> True option, they are effectively synonymous. The list of basis functions takes precedence, however, and using False with 1 in the list of basis functions retains the intercept in the model.

POSTED BY: Robert Nachbar

Here is a notebook with further information on the peculiar behavior (as well as what may be a simple workaround).

POSTED BY: Seth Chandler
Posted 5 years ago

Maybe to help clarify the issue about the overparameterization of the model you tried. Consider fitting that same model but using the design matrix/response format:

design = Transpose[{ConstantArray[1, Length[unpleasant]], unpleasant[[All, 1]], 
  unpleasant[[All, 2]], 1 - unpleasant[[All, 2]]}];
response = unpleasant[[All, 3]];
LogitModelFit[{design, response}]

Less than full rank message

It would be good of the other formulation would print out the "less than full rank" message.

POSTED BY: Jim Baldwin
Posted 5 years ago

Please don't promote things that make no sense such as randomly permuting the data when the order of the data should not make any difference. If the ordering does make a difference in LogitModelFit, then that should be reported to Wolfram as a bug. Randomly ordering should not be considered a "workaround".

For the unpleasant dataset AND your model: you've attempted to fit an overparameterized model. LogitModelFit thinks there are 4 parameters when only 3 can be fit when use

LogitModelFit[inconvenient, {1, x, a}, {x, a}, NominalVariables -> {a}]

You should be using something like one of the following:

LogitModelFit[inconvenient, {x, a}, {x, a}, IncludeConstantBasis -> False, NominalVariables -> a]
LogitModelFit[inconvenient, {x, a}, {x, a}]
POSTED BY: Jim Baldwin

Here's a more complete notebook; I didn't think it was important where the data came from. You will see the same phenomenon, however. The problem originates from a large research project involving "unfairness" by classifiers and different definitions of equality.

POSTED BY: Seth Chandler
Posted 5 years ago

Same thing as my previous answer. You are attempting to fit an overparameterized model. You've included the intercept and two values for the nominal variable when there are only 2 of those parameters that are estimable. This issue should be covered in a regression class. However, it is true that this might not seem obvious from the documentation for LogitModelFit.

POSTED BY: Jim Baldwin

Raising precision for the numbers, but not sure if the option should be used automatically

hp

POSTED BY: Shenghui Yang

Here's a notebook exploring the issue further.

POSTED BY: Seth Chandler

I cannot reproduce your results with M12.1 on my Mac, running Mojave. I always get $Failed.

I wanted to look at the correlation matrix of the parameters.

POSTED BY: Robert Nachbar

I didn't work in a fresh Kernel and had a residual assignment to x! My bad.

POSTED BY: Robert Nachbar
Posted 5 years ago

The issue of "lack of convergence" occurs with all iterative procedures and with all software packages (including SAS and R). However, many times it occurs when there is an overparameterized model, highly correlated predictor variables, and/or models (data) that just don't agree with the data (models).

If you could give a specific example or two, some specific advice could be given.

POSTED BY: Jim Baldwin
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract