Group Abstract Group Abstract

Message Boards Message Boards

2
|
11.9K Views
|
6 Replies
|
17 Total Likes
View groups...
Share
Share this post:

Automate data file imports to use Classify?

Posted 9 years ago

Hello, I am trying to use the default data files for machine learning. Often, these come as CSV or just data files that are text. For example, this data file has 768 records:

data = ReadList[
   "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-\
indians-diabetes/pima-indians-diabetes.data", Record, 768];

The column names represent:

col_names = {"pregnant", "glucose", "bp", "skin", "insulin", "bmi", "pedigree", "age", "label"};

I would like to be able to easily partition that into something that can directly be fed into Classify, for example:

mytrain ={{6,148,72,35,0,33.6,0.627,50}->1,{1,85,66,29,0,26.6,0.351,31}->0,{8,183,64,0,0,23.3,0.672,32}->1,{1,89,66,23,94,28.1,0.167,21}->0,{0,137,40,35,168,43.1,2.288,33}->1,{5,116,74,0,0,25.6,0.201,30}->0,{3,78,50,32,88,31.0,0.248,26}->1,{10,115,0,0,0,35.3,0.134,29}->0,{2,197,70,45,543,30.5,0.158,53}->1,{8,125,96,0,0,0.0,0.232,54}->1};

Is there a simple way to tell Mathematica to partition these 768 records and put them in the form given above {{...} -> label}, as shown in a generic way? That is, I can say split the data into a training and testing data set where I can select the number of columns, which is the label and how many items I want in the training and testing sets in the format needed by Mathematica?

I tried messing with all of the standard commands, but I must be missing some fundamental thing about the representation of data.

For what it is worth, I am trying to duplicate this example:

http://www.ritchieng.com/machine-learning-evaluate-classification-model

including all of the classifier results, confusion matrix, metrics and ROC curves.

Thank you for any insights.

POSTED BY: Q Q
6 Replies
POSTED BY: Vitaliy Kaurov
Posted 9 years ago

@Vitaliy Kaurov, thank you for providing valuable inputs as I only started dabbling in this area very recently.

I really appreciate doing things properly and using the best commands to do so as that is my purpose for using this tool!

it would be so helpful to users if MMA took sample data sets like the one I pointed to and did very detailed examples of the process, commands and the like.

Regards

POSTED BY: Q Q
Posted 9 years ago
POSTED BY: Q Q
Posted 9 years ago

@Hakan Kjellerstrand

Just excellent - thank you so much - greatly appreciated!

I have so much to learn and this makes my life easier.

POSTED BY: Q Q
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard