Group Abstract

Message Boards

WOLFRAM COMMUNITY

2

|

11.9K Views

|

6 Replies

|

17 Total Likes

View groups...

Follow this post

Share

Share this post:

GROUPS:

Data Science Import and Export Wolfram Language Machine Learning

Automate data file imports to use Classify?

Q Q

Posted 9 years ago

Hello, I am trying to use the default data files for machine learning. Often, these come as CSV or just data files that are text. For example, this data file has 768 records: data = ReadList[ "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-\ indians-diabetes/pima-indians-diabetes.data", Record, 768]; The column names represent: col_names = {"pregnant", "glucose", "bp", "skin", "insulin", "bmi", "pedigree", "age", "label"}; I would like to be able to easily partition that into something that can directly be fed into Classify, for example: mytrain ={{6,148,72,35,0,33.6,0.627,50}->1,{1,85,66,29,0,26.6,0.351,31}->0,{8,183,64,0,0,23.3,0.672,32}->1,{1,89,66,23,94,28.1,0.167,21}->0,{0,137,40,35,168,43.1,2.288,33}->1,{5,116,74,0,0,25.6,0.201,30}->0,{3,78,50,32,88,31.0,0.248,26}->1,{10,115,0,0,0,35.3,0.134,29}->0,{2,197,70,45,543,30.5,0.158,53}->1,{8,125,96,0,0,0.0,0.232,54}->1}; Is there a simple way to tell Mathematica to partition these 768 records and put them in the form given above {{...} -> label}, as shown in a generic way? That is, I can say split the data into a training and testing data set where I can select the number of columns, which is the label and how many items I want in the training and testing sets in the format needed by Mathematica? I tried messing with all of the standard commands, but I must be missing some fundamental thing about the representation of data. For what it is worth, I am trying to duplicate this example: http://www.ritchieng.com/machine-learning-evaluate-classification-model including all of the classifier results, confusion matrix, metrics and ROC curves. Thank you for any insights.

POSTED BY: Q Q

6 Replies

Sort By:

5

Vitaliy Kaurov, WOLFRAM Research

Posted 9 years ago

POSTED BY: Vitaliy Kaurov

0

Q Q

Posted 9 years ago

@Vitaliy Kaurov, thank you for providing valuable inputs as I only started dabbling in this area very recently. I really appreciate doing things properly and using the best commands to do so as that is my purpose for using this tool! it would be so helpful to users if MMA took sample data sets like the one I pointed to and did very detailed examples of the process, commands and the like. Regards

POSTED BY: Q Q

6

Hakan Kjellerstrand

Hakan Kjellerstrand, Independent Researcher

Posted 9 years ago

POSTED BY: Hakan Kjellerstrand

1

Q Q

Posted 9 years ago

POSTED BY: Q Q

3

Hakan Kjellerstrand

Hakan Kjellerstrand, Independent Researcher

Posted 9 years ago

POSTED BY: Hakan Kjellerstrand

0

Q Q

Posted 9 years ago

@Hakan Kjellerstrand Just excellent - thank you so much - greatly appreciated! I have so much to learn and this makes my life easier.

POSTED BY: Q Q

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback