Message Boards Message Boards

0
|
14025 Views
|
6 Replies
|
0 Total Likes
View groups...
Share
Share this post:

Help with importing a CSV file as Dataset

Posted 9 years ago

Hi,

I'm trying to import a CSV file as a Dataset object. All the entries in the CSV file are real numbers, except for the first row, which contains the variable names. Checking the documentation the function SemanticImport is supposed to do that, but I can't get it to work. This is what I'm using:

dataset = 
  SemanticImport["cleandata.csv", "Real", "Dataset", 
    HeaderLines -> 1, Delimiters -> ",", MissingDataRules -> {"." -> Missing[], "" -> Missing[]}];

I can import the data without problem using Import but I want to try the new Dataset object in Mathematica. Any suggestions?

POSTED BY: Miguel Olivo-V
6 Replies
Posted 9 years ago

Hey Marco, I saw you profile and it seems that you also work with data. I have used Mathematica since version 4 for symbolics and for simple data analysis using matrices but now I feel outdated with respect of these new objects that Mathematica has introduced in the newer versions. I wanted to ask you if you know of any tutorial on Mathematica data analysis using these new objects.

POSTED BY: Miguel Olivo-V

Dear Miguel,

I don't really know any written tutorial/book which discusses this in sufficient detail. There is, however, an online course

http://www.wolfram.com/training/courses/dat101.html

which might be useful.

Cheers,

Marco

POSTED BY: Marco Thiel
Posted 9 years ago

I see. I guess that does solve the issue. However, it seems to be pretty slow. My machine has been importing the file for the las 30 minutes, if I use Import it takes less than a minute. I guess I'll have to play with that when I buy a faster machine. Thanks anyways.

POSTED BY: Miguel Olivo-V

Oh, I see where the problem is. You might want to try the two column data set and Import it like so:

dataset = SemanticImport["~/Desktop/cleandata.csv", {"Real", "Real"}, "Dataset", HeaderLines -> 1, Delimiters -> ",", MissingDataRules -> {"." -> Missing[], "" -> Missing[]}];

or alternatively like so:

dataset = SemanticImport["~/Desktop/cleandata.csv", Automatic, "Dataset", HeaderLines -> 1, Delimiters -> ",", MissingDataRules -> {"." -> Missing[], "" -> Missing[]}];

Cheers,

Marco

POSTED BY: Marco Thiel
Posted 9 years ago

I get this: enter image description here

The first row contains the variable names in the CSV file. I think the problem arises when there is more than 1 column in the CSV. If I run your example with RandomReal[1,{100,2}], it imports the first column only.

POSTED BY: Miguel Olivo-V

Hi Miguel,

this seems to work fine on my system (OS X, MMA10.0.2).

I generate some test data and export that:

data = RandomReal[1, 100];
Export["~/Desktop/cleandata.csv", Join[{"Test Data"}, data]]

Running your command:

dataset = SemanticImport["~/Desktop/cleandata.csv", "Real", "Dataset", HeaderLines -> 1, Delimiters -> ",", 
   MissingDataRules -> {"." -> Missing[], "" -> Missing[]}];

gives:

enter image description here

What happens when you execute your command?

Cheers,

Marco

POSTED BY: Marco Thiel
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract