Message Boards

WOLFRAM COMMUNITY

4204 Views

1 Reply

4 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

What is the naive Bayes Classifier doing?

Christoph S.

Posted 2 years ago

I can't reconcile Mathematica's (12.3.1.0) naive Bayes classification probability for simple {Boolean} -> {Boolean} datasets. I've expected that Mathematica would use a Bernoulli model for this setup but that's clearly not the case. Example: Classify[ {True, True, False, True, True, True} -> {False, True, False, True, False, False}, Method -> "NaiveBayes"][{True}, "Probability" -> True] gives p = 0.63... (for True -> True) but even my dog sees that it should be < 0.5. What's going on here?

POSTED BY: Christoph S.

1 Reply

Sort By:

Giulio Alessandrini

Giulio Alessandrini, Wolfram Research Inc.

Posted 2 years ago

I believe this is due to some overzealous standardization step in the automated processing pipeline (boolean vectors are converted to numerical vectors for processing). You can disable that using the "Minimal" feature extraction: data = {True, True, False, True, True, True} -> {False, True, False, True, False, False}; cf1 = Classify[data, Method -> "NaiveBayes"]; cf2 = Classify[data, Method -> "NaiveBayes", FeatureExtractor -> "Minimal"]; cf1[{True, False}, "Probabilities"] (* {<\|False -> 0.30733, True -> 0.69267\|>, <\|False -> 0.963594, True -> 0.0364056\|>} ) cf2[{True, False}, "Probabilities"] ( {<\|False -> 0.579921, True -> 0.420079\|>, <\|False -> 0.579921, True -> 0.420079\|>} *)

I believe this is due to some overzealous standardization step in the automated processing pipeline (boolean vectors are converted to numerical vectors for processing). You can disable that using the "Minimal" feature extraction:

data = {True, True, False, True, True, True} -> {False, True, False, True, False, False};
cf1 = Classify[data, Method -> "NaiveBayes"];
cf2 = Classify[data, Method -> "NaiveBayes", FeatureExtractor -> "Minimal"];

cf1[{True, False}, "Probabilities"]
(* {<|False -> 0.30733, True -> 0.69267|>, <|False -> 0.963594, True -> 0.0364056|>} *)

cf2[{True, False}, "Probabilities"]
(* {<|False -> 0.579921, True -> 0.420079|>, <|False -> 0.579921, True -> 0.420079|>} *)

POSTED BY: Giulio Alessandrini

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Group Abstract

Feedback