Group Abstract

Message Boards

WOLFRAM COMMUNITY

10.3K Views

4 Replies

3 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Mathematics Mathematica Wolfram Language

Random rounding in classifier function

Douglas Kubler

Posted 11 years ago

I noticed this while viewing the documentation for "ClassifierMeasurements". I executed each of the code lines. I saw that `Out[4]` was .5 instead of the expected .75. Upon returning to the same page and executing the result was .75 again. This can be reproduced by executing the code, doing something else, and trying again. The flip also occurs in a notebook copy of the code. Looks like the problem comes from 4.5 being halfway between "A" and "B". Notebook of the code attached. Picture is first execution from a pristine load of M10, then directly to documentation. Attachments:

POSTED BY: Douglas Kubler

4 Replies

Sort By:

Etienne Bernard

Etienne Bernard, NuMind

Posted 11 years ago

Hi Mark, as you suspected in the notebook, this behavior is observed when there is a tie. When the most likely classes have the same probability (or more generally the same utility), a RandomChoice of these classes is done. To avoid this behavior you can use the undocumented options "TieBreakerFunction" and put any function you like. For example "TieBreakerFunction" -> First will give a determinate result.

POSTED BY: Etienne Bernard

Mark Tuttle

Mark Tuttle, Doing Software Development

Posted 11 years ago

Per my later post on this same observed non-determinism in Classify on small data sets, the behavior can be seen even when the Method is fixed, e.g., with NaiveBayes. Example attached. Attachments:

POSTED BY: Mark Tuttle

Etienne Bernard

Etienne Bernard, NuMind

Posted 11 years ago

This behaviour is frequent when training on tiny dataset such as this one. Here, Classify cannot discriminate between the method "LogisticRegression" and the method "NearestNeighbors", it thus choses one at random (and for this test set, logistic regression has a better accuracy).

POSTED BY: Etienne Bernard

David Reiss

David Reiss, Scientific Arts

Posted 11 years ago

Here is a bit more of the puzzle: In[278]:= trainingset = {1 -> "A", 2 -> "A", 3.5 -> "B", 4 -> "A", 5 -> "B", 6 -> "B"}; In[279]:= Table[Classify[trainingset, 3.9, "Probabilities"], {15}] Out[279]= {<\|"A" -> 0.833333, "B" -> 0.166667\|>, <\|"A" -> 0.423239, "B" -> 0.576761\|>, <\|"A" -> 0.423239, "B" -> 0.576761\|>, <\|"A" -> 0.423239, "B" -> 0.576761\|>, <\|"A" -> 0.423239, "B" -> 0.576761\|>, <\|"A" -> 0.833333, "B" -> 0.166667\|>, <\|"A" -> 0.423239, "B" -> 0.576761\|>, <\|"A" -> 0.423239, "B" -> 0.576761\|>, <\|"A" -> 0.833333, "B" -> 0.166667\|>, <\|"A" -> 0.4, "B" -> 0.6\|>, <\|"A" -> 0.423239, "B" -> 0.576761\|>, <\|"A" -> 0.423239, "B" -> 0.576761\|>, <\|"A" -> 0.833333, "B" -> 0.166667\|>, <\|"A" -> 0.423239, "B" -> 0.576761\|>, <\|"A" -> 0.423239, "B" -> 0.576761\|>} So, the question is why the classification probabilities that are the result of the Classify function are non-deterministic. Now, I am posting this "blind" in the sense that I do not know enough about how the classification works. And it may be that the algorithm is intrinsically stochastic. But I am showing my ignorance here and the net result is that I need to read a book on classification and machine learning... ;-) It may be that a random initialization in a gradient descent approach to optimization is finding differing local minima.

Here is a bit more of the puzzle:

In[278]:= 
trainingset = {1 -> "A", 2 -> "A", 3.5 -> "B", 4 -> "A", 5 -> "B", 
   6 -> "B"};

In[279]:= Table[Classify[trainingset, 3.9, "Probabilities"], {15}]

Out[279]= {<|"A" -> 0.833333, "B" -> 0.166667|>, <|"A" -> 0.423239, 
  "B" -> 0.576761|>, <|"A" -> 0.423239, 
  "B" -> 0.576761|>, <|"A" -> 0.423239, 
  "B" -> 0.576761|>, <|"A" -> 0.423239, 
  "B" -> 0.576761|>, <|"A" -> 0.833333, 
  "B" -> 0.166667|>, <|"A" -> 0.423239, 
  "B" -> 0.576761|>, <|"A" -> 0.423239, 
  "B" -> 0.576761|>, <|"A" -> 0.833333, 
  "B" -> 0.166667|>, <|"A" -> 0.4, "B" -> 0.6|>, <|"A" -> 0.423239, 
  "B" -> 0.576761|>, <|"A" -> 0.423239, 
  "B" -> 0.576761|>, <|"A" -> 0.833333, 
  "B" -> 0.166667|>, <|"A" -> 0.423239, 
  "B" -> 0.576761|>, <|"A" -> 0.423239, "B" -> 0.576761|>}

So, the question is why the classification probabilities that are the result of the Classify function are non-deterministic.

Now, I am posting this "blind" in the sense that I do not know enough about how the classification works. And it may be that the algorithm is intrinsically stochastic. But I am showing my ignorance here and the net result is that I need to read a book on classification and machine learning... ;-)

It may be that a random initialization in a gradient descent approach to optimization is finding differing local minima.

POSTED BY: David Reiss

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback