Group Abstract

Message Boards

WOLFRAM COMMUNITY

795 Views

5 Replies

3 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Data Science Statistics and Probability

What loss function is Mathematica minimizing with LogitModelFit?

Iuval Clejan

Posted 2 months ago

Is it a K-L divergence or cross entropy or some squared error metric? I hope not the latter, but it concerns me that it treats a logistic model as a special case of a generalized linear model.

POSTED BY: Iuval Clejan

5 Replies

Sort By:

Jim Baldwin

Posted 2 months ago

`LogitModelFit` and `GeneralizedLinearModelFit` can give the exact same estimates: data = {{1, 0}, {2, 0}, {2, 0}, {2, 1}, {2, 0}, {3, 0}, {3, 0}, {3, 0}, {3, 1}, {3, 1}, {3, 1}, {4, 1}, {4, 1}, {5, 0}, {6, 1}, {7, 1}}; LogitModelFit[data, x, x]["ParameterTable"] GeneralizedLinearModelFit[data, x, x, ExponentialFamily -> "Binomial"]["ParameterTable"]

LogitModelFit and GeneralizedLinearModelFit can give the exact same estimates:

data = {{1, 0}, {2, 0}, {2, 0}, {2, 1}, {2, 0}, {3, 0}, {3, 0}, {3, 0}, {3, 1}, {3, 1}, {3, 1}, 
  {4, 1}, {4, 1}, {5, 0}, {6, 1}, {7, 1}};
LogitModelFit[data, x, x]["ParameterTable"]

Parameter table for LogitModelFit

GeneralizedLinearModelFit[data, x, x, ExponentialFamily -> "Binomial"]["ParameterTable"]

Parameter table for GeneralizedLinearModelFit

POSTED BY: Jim Baldwin

Iuval Clejan

Posted 2 months ago

Great. So does it actually minimize a cross entropy, or does it do something else?

POSTED BY: Iuval Clejan

Jim Baldwin

Posted 2 months ago

maximize likelihood = minimize K-L divergence = minimize cross entropy with "maximize likelihood" being the preferred term (in most disciplines except maybe yours).

POSTED BY: Jim Baldwin

Jim Baldwin

Posted 2 months ago

I don't understand why it concerns you that Mathematica "treats a logistic model as a special case of a generalized linear model." `LogitModelFit` and `GeneralizedLinearModelFit` use maximum likelihood to estimate the parameters. R's `glm` and most generalized linear model software also use maximum likelihood. But minimizing K-L divergence and minimizing cross entropy are equivalent to maximum likelihood. You must be from a machine language background rather than a statistics background. Yes, another case where a relatively new discipline thinks they need to create a whole new language.

POSTED BY: Jim Baldwin

Iuval Clejan

Posted 2 months ago

You're right, empirically! I tested the LogitModelFit against a manual minimization of a cross entropy and they give the same optimal parameters. I was concerned because I didn't know what statistical assumptions are used in the LogitModelFit for the independent trials of generating the dependent variable, Bernoulli or Gaussian? I guess Bernoulli, otherwise maximum likelihood would give a chi square type loss function instead of the correct cross entropy.

POSTED BY: Iuval Clejan

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback