Message Boards Message Boards

Understanding how classify is fitting my data

Posted 9 years ago

Hi all,

I'm pretty new to the Machine Learning package in Mathematica, and so far I like how easy it is to use. However, one thing that I wish it had was more information about how mathematica is treating high-dimensional datasets. In particular, I would like to be able to understand which features are relatively more or less important in the classification model.

For example, I have a 34-dimensional dataset of clinical variables for patients who either did or did not respond to cancer treatment. The classifiication label being used is 'CR' for complete response and 'RESISTANT' for resistant to treatment.

trainSet = Import["TrainSetCR.mx"];

validationSet = Import["ValidationSetCR.mx"];

I am using Classify to train both Logistic Regression and Random Forest classifiers for these data. I can get some high-level information about the classifiers produced using these methods with the ClassifierInformation function, but I would like to understand how the classifier is treating each feature.

For Logistic Regression, I can use the Function property to get the function the classifier is using, but it is hard to understand.

CRClassifier = Classify[trainSetCR, Method -> "LogisticRegression"];
ClassifierInformation[CRClassifier]
CRClassifierProperties = ClassifierInformation[CRClassifier, "Properties"]
ClassifierInformation[CRClassifier, "Function"]
ClassifierMeasurements[CRClassifier, validationSet] /@ {"Accuracy", "ConfusionMatrixPlot"}

For Random Forest, I cannot find any property that would allow me to understand how each feature is being used to classify the data.

CRClassifier = Classify[trainSetCR, Method -> "RandomForest"];
ClassifierInformation[CRClassifier]
CRClassifierProperties = ClassifierInformation[CRClassifier, "Properties"]
ClassifierMeasurements[CRClassifier, validationSet] /@ {"Accuracy", "ConfusionMatrixPlot"}

I really do enjoy using the Machine Learning package in Mathematica because it is easy to configure and try various machine learning techniques. However, I think the package could do a little bit better at allowing users to understand how these models are treating various features. I've attached my dataset and mathematica notebook for any who would like to look at the data. Any suggestions on how I could approach understanding these models in greater depth would be greatly appreciated.

Attachments:
POSTED BY: Brady Hunt
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract