Message Boards Message Boards

Deciphering a DecisionTree Classifier

GROUPS:

Suppose I build a decision tree classifier called c using Classify. I can do so using Classify directly with a "DecisionTree" Method option or I can do so using ActiveClassification with an option like opt below and then making "ClassifierFunction" an argument to the resulting ActiveClassificationObject.

 opt=Association["EvaluationStrategy" -> "MaxEntropy", 
  "ClassificationMethod" -> {"DecisionTree", 
    "DistributionSmoothing" -> 3, "FeatureFraction" -> 0.2}]

Either way, one can end up with a ClassifierFunction object. If one runs the code c[[1]]["Model"] one can then dig inside the ClassifierFunction. Here is what I see.

 <|"Tree" -> <|"FeatureIndices" -> RawArray["Integer16", {1, 2, 3, 4, 4, 5, 10, 11, 15, 45, 45, 45, 48}], 
    "NumericalThresholds" -> RawArray["Real32", {-0.14106103777885437, 0.29004108905792236, 1.389285922050476, 
      1.0026018619537354, 1.0108298063278198, -0.4484309256076813, 0.9995112419128418, -0.09746681898832321, 
      -1.4365870952606201, -1.0207768678665161, -1.0174717903137207, 1.4783092737197876, 
      -1.3999470472335815}], "NominalSplits" -> {}, "Children" -> RawArray["Integer16", {{7, -12}, {-1, -2}, 
      {-5, 5}, {-10, -11}, {-8, -9}, {4, -6}, {-13, -14}, {2, 10}, {6, 13}, {-3, -4}, {8, 12}, {9, 3}, {1, 
      -7}}], "LeafValues" -> RawArray["UnsignedInteger16", {{7, 9}, {23, 3}, {3, 400}, {4, 3}, {23, 3}, {34, 
      3}, {1032, 3}, {10, 6}, {3, 150}, {6, 5}, {3, 21}, {26, 3}, {3, 8}, {5, 3}}], "RootIndex" -> 11, 
    "NominalDimension" -> 0|>, "Processor" -> MachineLearning`MLProcessor["Values", 
    <|"Info" -> <|"f1" -> <|"Type" -> "NumericalVector", "Weight" -> 1|>|>, "Invertibility" -> "Perfect", 
     "Missing" -> "Allowed"|>], "Method" -> "DecisionTree", 
  "Options" -> <|"DistributionSmoothing" -> <|"Value" -> 3, "Options" -> <||>|>, 
    "FeatureFraction" -> <|"Value" -> 0.2, "Options" -> <||>|>|>|>

Is there any way to interpret this output to determine more precisely what the decision tree is doing? It appears, for example, as if there might be 14 leaves in the decision tree: the "LeafValues" key has 14 entries. And it appears as if the tree made splits based on 13 of the features in the dataset, which happens to have 100 features. But I'm not sure of any of this and I can't figure out the sequence of decisions the classifier made. Is there any way to reconstruct this information so that one could explain to a person in human language more precisely what the decision tree is doing?

Interpretability is, after all, the best attribute of decision trees as classifiers and it is unfortunate that the current version of Mathematica does not have any simple way of visualizing the tree. I am hopeful that the next release will include some functionality in this area, but as we wait with the usual impatience for that event, does anyone have suggestions that would render the DecisionTree less opaque.

POSTED BY: Seth Chandler
Answer
5 months ago

Group Abstract Group Abstract