While not formally trained as a statistician, I do understand the issues of parameter identifiability and nonlinear optimization. I would also like to learn from @Seth Chandler's example, particularly how to diagnose the cause of the failure.
Using the extra precision approach of @Shenghui Yang, I can get a fitted model object. The design matrix has only three columns, so I'm curious why @Robert Rimmer said there are four parameters. I know that each nominal variable i with ki levels introduces ki - 1 parameters, but that is not the case here.
In[18]:= fit = LogitModelFit[SetPrecision[data, 30], {1, x, a}, {x, a},
NominalVariables -> {a}]

The largest absolute value of the off-diagonal elements of the correlation matrix is 0.626:
In[30]:= fit["CorrelationMatrix"]
Max@Abs@LowerTriangularize[%, -1]
Out[30]= {{1.0000000000000000000000000000,
0.6263318050947711872702412943, -0.1063345472486892413459999850}, \
{0.6263318050947711872702412943, 1.0000000000000000000000000000,
0.5110815111665113340197262073}, {-0.1063345472486892413459999850,
0.5110815111665113340197262073, 1.0000000000000000000000000000}}
Out[31]= 0.6263318050947711872702412943
Moreover, the design matrix is of full rank:
In[32]:= X = fit["DesignMatrix"];
Dimensions@X
Out[33]= {20, 3}
In[34]:= SingularValueList[X]
Out[34]= {5.01758683340411161954703549800, \
3.51108284450300297772988275222, 1.91399760190032649685817260276}
and not ill-conditioned. All the nominal variable recoding methods with which I'm familiar do not produce an n by 4 design matrix for this problem, as suggested above, but n by 3 (cf. Coding Systems for Categorical Variables in Regression). In fact, that design matrix has an exact colinarity as the last column is a linear combination of the preceding columns!
I do agree that Fit
, while quick and dirty, never gives me the information I generally want about the quality of a fit, and I generally use LinearModelFit
or NonlinearModelFit
.
Permuting the order of the data rows should not change the result for these deterministic statistical methods, and the fact that it does suggests a numerical instability due to machine precision of the numbers. The final message irlsfail
is curious because the documentation for LogitModelFit
does not mention iterative reweighting. The subject is mentioned in the tutorial Numerical Operations on Data, but now I'm getting out of my depth and would prefer someone form the stats team at WRI to shed some light on the issue.
Regarding the use of 1
as a basis function or the use of the IncludeConstantBasis -> True
option, they are effectively synonymous. The list of basis functions takes precedence, however, and using False
with 1
in the list of basis functions retains the intercept in the model.