Here is a bit more of the puzzle:
In[278]:=
trainingset = {1 -> "A", 2 -> "A", 3.5 -> "B", 4 -> "A", 5 -> "B",
6 -> "B"};
In[279]:= Table[Classify[trainingset, 3.9, "Probabilities"], {15}]
Out[279]= {<|"A" -> 0.833333, "B" -> 0.166667|>, <|"A" -> 0.423239,
"B" -> 0.576761|>, <|"A" -> 0.423239,
"B" -> 0.576761|>, <|"A" -> 0.423239,
"B" -> 0.576761|>, <|"A" -> 0.423239,
"B" -> 0.576761|>, <|"A" -> 0.833333,
"B" -> 0.166667|>, <|"A" -> 0.423239,
"B" -> 0.576761|>, <|"A" -> 0.423239,
"B" -> 0.576761|>, <|"A" -> 0.833333,
"B" -> 0.166667|>, <|"A" -> 0.4, "B" -> 0.6|>, <|"A" -> 0.423239,
"B" -> 0.576761|>, <|"A" -> 0.423239,
"B" -> 0.576761|>, <|"A" -> 0.833333,
"B" -> 0.166667|>, <|"A" -> 0.423239,
"B" -> 0.576761|>, <|"A" -> 0.423239, "B" -> 0.576761|>}
So, the question is why the classification probabilities that are the result of the Classify function are non-deterministic.
Now, I am posting this "blind" in the sense that I do not know enough about how the classification works. And it may be that the algorithm is intrinsically stochastic. But I am showing my ignorance here and the net result is that I need to read a book on classification and machine learning... ;-)
It may be that a random initialization in a gradient descent approach to optimization is finding differing local minima.