# Finding the Classifier Boundary Function For Method->LogisticRegression

Posted 11 months ago
1178 Views
|
|
7 Total Likes
|
 Here' s a diagram we often see in machine learning. I want to construct something like it using the Wolfram Language and Classify, at least when the method is logistic regression. Here's how I go about it. Let' s generate some simple data.  SeedRandom[121217]; trainx = RandomReal[{0, 1}, {20, 2}]; trainy = Map[ If[LogisticSigmoid[{-2.7, 1.4}.# + RandomVariate[NormalDistribution[0.3, 0.5]]] > 0.5, 1, 0] &, trainx]; If our classification task does not require much complexity, we can use LogitModelFit and some algebra to find the boundary. boundaryExpression[fm_FittedModel] := y /. Quiet@First@Solve[fm[x, y] == 1/2, y] // Expand; boundaryExpression[lomf] We find it is y=-3.69718 + 10.2671 xWe can now visualize the boundary and get a picture like the one above with the following code.  With[{be = boundaryExpression[lomf]}, Show[ListPlot[ KeySort[GroupBy[ MapThread[List, {trainx, trainy}], (ToString[Last[#]] &) -> First]], PlotMarkers -> {{"-", 18}, {"+", 18}}, Axes -> False, Frame -> True], Plot[be, {x, 0, 1}, PlotStyle -> {Thick, Dashed, Black}]] But all of that is using LogitModelFit. Sometimes we need regularization or other features of Classify. So, here's how we generate a similar picture using Classify.  cl = Classify[trainx -> trainy, Method -> {"LogisticRegression", "L1Regularization" -> 0, "L2Regularization" -> 0}, TrainingProgressReporting -> None]; We get the probabilities for each class.  ci = ClassifierInformation[cl, "ProbabilitiesFunction"] This generates a function that produces an Association:  Association[{0 -> (0.0307046 E^(8.58421 #1))/( 0.0307046 E^(0. + 8.58421 #1) + 0.675585 E^(0.836087 #2)), 1 -> 1./(1. + 0.0454489 E^(8.58421 #1 - 0.836087 #2))}] & We can again use a little algebra to find the boundary.  boundaryExpression[a_Function] := y /. Quiet@ First@Simplify[Solve[Equal @@ a[x, y], y, Reals], x \[Element] Reals]; boundaryExpression[cl_ClassifierFunction] := boundaryExpression[ ClassifierInformation[cl, "ProbabilitiesFunction"]] We can now use this function to make a plot quite similar to the one above.  With[{be = boundaryExpression[cl]}, Show[ListPlot[ KeySort[GroupBy[ MapThread[List, {trainx, trainy}], (ToString[Last[#]] &) -> First]], PlotMarkers -> {{"-", 18}, {"+", 18}}, Axes -> False, Frame -> True], Plot[be, {x, 0, 1}, PlotRange -> {{0, 1}, {0, 1}}, PlotStyle -> {Dashed, Black}] ] ] Here's another way that does not require use of algebra. Instead we rely on RegionPlot and Ordering.  Show[ListPlot[ KeySort[GroupBy[ MapThread[List, {trainx, trainy}], (ToString[Last[#]] &) -> First]], PlotMarkers -> {{"-", 18}, {"+", 18}}, Axes -> False, Frame -> True], RegionPlot[ Ordering[ci[x, y], -1][[1]] == 1, {x, 0 - 1, 2}, {y, -1, 2}, PlotRange -> {{0, 1}, {0, 1}}, BoundaryStyle -> {Dashed, Black}, PlotStyle -> {Opacity[0], White}] ] By using RegionPlot we can readily extend the production of boundary diagrams to situations involving more than two classes.  cl3 = Classify[trainx -> trainy2, Method -> {"LogisticRegression", "L1Regularization" -> 0, "L2Regularization" -> 0}, TrainingProgressReporting -> None]; ci3 = ClassifierInformation[cl3, "ProbabilitiesFunction"]; c3plot = Show[ ListPlot[KeySort[ GroupBy[MapThread[List, {trainx, trainy2}], (ToString[Last[#]] &) -> First]], PlotMarkers -> {{"-", 18}, {"+", 18}, {"2", 18}}, Axes -> False, Frame -> True], RegionPlot[{Ordering[ci3[x, y], -1][[1]] == 1, Ordering[ci3[x, y], -1][[1]] == 2, Ordering[ci3[x, y], -1][[1]] == 3}, {x, 0 - 1, 2}, {y, -1, 2}, PlotRange -> {{0, 1}, {0, 1}}, BoundaryStyle -> {{Dotted, Black}}, PlotStyle -> {Opacity[0.05]}] ] I attach a notebook that recapitulates this post and adds an animation. Attachments: