Message Boards Message Boards

Easy way to visualize decision tree classifier?

I know this question has been asked before. I'm asking again now because I know they've been a lot of improvements in the machine learning features and Mathematica.

Is there an easy way to view a decision tree classifier?

Python has libraries such as graphViz etc. to view the tree. These are extremely helpful because ultimately, that is the whole point of a decision tree: make the interpretation of the data simple.

Despite searching several forms, I can't seem to find something simple that would allow us to map the tree branches and nodes.

POSTED BY: Gaurav Khanna
3 Replies

There is a private function that turns DecisionTree into Tree: data=Table[x->Sin[x]+RandomVariate[NormalDistribution[0, .2]], {x, RandomReal[{-10, 10}, 400]}]; p=Predict[data,Method->"DecisionTree"]; MachineLearning`file23DecisionTree`PackagePrivate`toTree@p[[1]]["Model"]["Tree"]

Here is the one that does the reverse:

fromTree[tree_Tree, nominalDimension_Integer : Automatic] := 
 Module[{leafPositions, nominalPositions, numericalPositions, 
   nodePositions, numericalFeatureIndices, nominalFeatureIndices, 
   nominalSplits, numericalThresholds, numericalOrdering, 
   nominallOrdering, ordering}, 
  leafPositions = TreePosition[tree, _, {-1}];
  nominalPositions = TreePosition[tree, _Equal, {0, -2}];
  numericalPositions = TreePosition[tree, _GreaterEqual, {0, -2}];
  nodePositions = Join[nominalPositions, numericalPositions];
  nominalFeatureIndices = 
   TreeExtract[tree, nominalPositions, 
    TreeData/*Replace[Indexed[_, i_] == _ :> i]];
  numericalFeatureIndices = 
   TreeExtract[tree, numericalPositions, 
    TreeData/*Replace[_ >= Indexed[_, i_] :> i]];
  nominalSplits = 
   TreeExtract[tree, nominalPositions, TreeData/*Last/*(2^# &)];
  numericalThresholds = 
   TreeExtract[tree, numericalPositions, TreeData/*First];
  nominallOrdering = 
   Ordering@
    Thread[{nominalFeatureIndices, nominalSplits, nominalPositions}];
  numericalOrdering = 
   Ordering@
    Thread[{numericalFeatureIndices, numericalThresholds,(*-Reverse/@*)
      numericalPositions}];
  ordering = 
   Join[nominallOrdering, 
    Length[nominallOrdering] + numericalOrdering];
  MachineLearning`DecisionTree[<|
    "FeatureIndices" -> 
     NumericArray[
      Replace[Join[nominalFeatureIndices[[nominallOrdering]], 
        numericalFeatureIndices[[numericalOrdering]]], {} -> {-1}], 
      "Integer16"], 
    "NumericalThresholds" -> numericalThresholds[[numericalOrdering]],
     "NominalSplits" -> nominalSplits[[nominallOrdering]], 
    "Children" -> 
     NumericArray[
      Replace[{} -> {{-1}}]@
       With[{positions = nodePositions[[ordering]]}, 
        First@FirstPosition[
              positions, #, -FirstPosition[leafPositions, #], 
              1] & /@ {Append[#, 1], Append[#, 2]} & /@ positions], 
      "Integer16"], 
    "LeafValues" -> 
     NumericArray[TreeExtract[tree, leafPositions, TreeData]], 
    "RootIndex" -> 
     First@FirstPosition[nodePositions[[ordering]], {}, {1}, 1], 
    "NominalDimension" -> 
     Replace[nominalDimension, 
      Automatic -> 
       Replace[Max[Max[nominalFeatureIndices], 
         Min[numericalFeatureIndices] - 1], Infinity -> 0]]|>]]
POSTED BY: Nikolay Murzin

The easiest way right now is by using

Information[p, "DecisionTree"] // Head
(* Tree *)

However, keep in mind that this tree is not acting on the input data but on the processed data. You can force the processing pipeline to be minimal using

Predict[data, Method -> "DecisionTree", FeatureExtractor -> "Minimal"]

which in this case is removing the standardization. With non-numerical data you will still get a tree that works on something else though.

Nikolay -- thanks so much for your response. The private function you mentioned:

MachineLearning`file23DecisionTree` PackagePrivate` toTree@p[[1]]["Model"]["Tree"]

Did indeed produce a visual of a tree. Unfortunately, the information in there is far to sparse to be of much help. I find that python's scikit-learn and the graphviz package at least tell you what is going on at each of the splits in terms of the gini index, the number of variables etc.

I'm gonna give this feedback to the Mathematica team. I find that not providing information of how a tree is split individual way makes a very difficult to work with Mathematica for this particular method.

POSTED BY: Gaurav Khanna
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract