Message Boards Message Boards

Automate converting an imported excel spreadsheet to a dataset?

Posted 5 years ago

I would like to import a large excel spreadsheet with many columns into Mathematica. The first row of the spreadsheet contains the column headings and the remaining rows contain the data. After importing I first separate the column headings and data into separate variables and would like to convert the data to a dataset. Here is an example of what I have tried.

These steps simulate of importing the spreadsheet and separating into columns and data

cols = {"col1", "col2", "col3", "col4", "col5"};
dat = RandomReal[{0, 1}, {10, 5}];

I now need to create an association to use to map the data to the columns but here is where I am having trouble. The best I am able to come up with is to generate

assoc = Table[cols[[k]] -> #[[k]], {k, 1, 5}]

This results in some error messages, but produces

{"col1" -> 1, "col2" -> #1[[2]], "col3" -> #1[[3]], "col4" -> #1[[4]],
  "col5" -> #1[[5]]}

which is nearly the result desired. I can then copy that result, paste it into a new cell, and edit that cell by hand, changing the first element of the list, applying an association function, and applying a function to get the desired function to apply to the data

assoc1 = Association[{"col1" -> #[[1]], "col2" -> #1[[2]], 
    "col3" -> #1[[3]], "col4" -> #1[[4]], "col5" -> #1[[5]]}] &
Map[assoc1,dat]

With this association I can then generate the dataset. Is there any way of generating this function without the necessity of the hand editing step?

POSTED BY: Mike Luntz
4 Replies
Posted 5 years ago

Marcin,

If condition1 is a Dataset then Normal[condition1] should be a list of 85 lists each of length 26. You want each of the 85 to be assigned the label 1. One way to do this is Thread[Normal[condition1] -> 1]]. Do the same for condition2 with label 2. Passing both results to Classify should work.

Rohit

POSTED BY: Rohit Namjoshi

To get help have a look in Mathematica help of XLSX.

A follow up question: I imported Excel file as Rohit suggested and wanted to use it in Classify function. I can import Excel file as "Data" or "Dataset". Each row contains data for each patient, as Rohit has shown for a table of customers. I have two Excel sheets each for Condition1 and Condition2, both with same 26 columns structure for about 85 patients.

My problem is that Classify[ Condition1->1, Condition2->2] uses each table as if this was one patient, so as if I have two patients for training and not 85.

I tried to use Partition function to force reading each row as one patient bit without success. Partition[ Condition1, 26] or Partition[ Condition1, {26,1}] gives me an error of wrong dimension, which I cannot correct. Do I have to read Excel file row after row, our there is a more clever way?

POSTED BY: Marcin Balcerzyk
Posted 5 years ago

Many thanks Rohit. I was not aware that you could import directly to a dataset. That solves my problem.

Is that capability documented anywhere? By the way, when I imported the file with your suggested command I got an error message stating

"Import::hdrs: The value 1 for the option "HeaderLines" is outside of the data range."

In spite of the error message the file apparently imported correctly.

POSTED BY: Mike Luntz
Posted 5 years ago

Not sure if this is exactly what you are after, but you can Import Excel directly as a Dataset.

dataset = Import["~/Downloads/Times_Square_Hotels.xlsx", "Dataset",  HeaderLines -> 1]

enter image description here

POSTED BY: Rohit Namjoshi
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract