Group Abstract Group Abstract

Message Boards Message Boards

1
|
264 Views
|
5 Replies
|
3 Total Likes
View groups...
Share
Share this post:

What loss function is Mathematica minimizing with LogitModelFit?

Posted 25 days ago

Is it a K-L divergence or cross entropy or some squared error metric? I hope not the latter, but it concerns me that it treats a logistic model as a special case of a generalized linear model.

POSTED BY: Iuval Clejan
5 Replies
Posted 19 days ago

LogitModelFit and GeneralizedLinearModelFit can give the exact same estimates:

data = {{1, 0}, {2, 0}, {2, 0}, {2, 1}, {2, 0}, {3, 0}, {3, 0}, {3, 0}, {3, 1}, {3, 1}, {3, 1}, 
  {4, 1}, {4, 1}, {5, 0}, {6, 1}, {7, 1}};
LogitModelFit[data, x, x]["ParameterTable"]

Parameter table for LogitModelFit

GeneralizedLinearModelFit[data, x, x, ExponentialFamily -> "Binomial"]["ParameterTable"]

Parameter table for GeneralizedLinearModelFit

POSTED BY: Jim Baldwin
Posted 19 days ago

Great. So does it actually minimize a cross entropy, or does it do something else?

POSTED BY: Iuval Clejan
Posted 19 days ago

maximize likelihood = minimize K-L divergence = minimize cross entropy with "maximize likelihood" being the preferred term (in most disciplines except maybe yours).

POSTED BY: Jim Baldwin
Posted 19 days ago

I don't understand why it concerns you that Mathematica "treats a logistic model as a special case of a generalized linear model."

LogitModelFit and GeneralizedLinearModelFit use maximum likelihood to estimate the parameters. R's glm and most generalized linear model software also use maximum likelihood.

But minimizing K-L divergence and minimizing cross entropy are equivalent to maximum likelihood. You must be from a machine language background rather than a statistics background. Yes, another case where a relatively new discipline thinks they need to create a whole new language.

POSTED BY: Jim Baldwin
Posted 19 days ago

You're right, empirically! I tested the LogitModelFit against a manual minimization of a cross entropy and they give the same optimal parameters. I was concerned because I didn't know what statistical assumptions are used in the LogitModelFit for the independent trials of generating the dependent variable, Bernoulli or Gaussian? I guess Bernoulli, otherwise maximum likelihood would give a chi square type loss function instead of the correct cross entropy.

POSTED BY: Iuval Clejan
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard