Group Abstract Group Abstract

Message Boards Message Boards

16
|
20.3K Views
|
9 Replies
|
29 Total Likes
View groups...
Share
Share this post:

Fast CSV reader needed

Posted 9 years ago
POSTED BY: Szabolcs Horvát
9 Replies
POSTED BY: Hans Michel
POSTED BY: Hans Michel

ReadList seems many times faster than Import. Is there a reason you don't mention it as an alternative?

POSTED BY: George Wolfe

How would you use ReadList to read a CSV that has both numerical and string data?

Example:

data = "1.2,foo,4.5
6,bar,0.5";

str = StringToStream[data]
ReadList[str, ???]
Close[str]

What if some of the strings include commas?

data = "1.2,\"foo\",4.5
4,\"bar,baz\",0.5";

The desired result is

ImportString[data, "CSV"]
(* {{1.2, "foo", 4.5}, {4, "bar,baz", 0.5}} *)

EDIT:

To explain further, the function I am looking for would be able to do exactly what SemanticImport does in the following example, but it would perform much better on large data:

data = "x,name,y
1.2,foo,4.5
6,\"bar,baz\",0.5
9,666,0";

SemanticImportString[data, {"Number", "String", "Number"}, "NamedColumns", HeaderLines -> 1]

(* <|"x" -> {1.2`, 6, 9}, "name" -> {"foo", "bar,baz", "666"}, "y" -> {4.5`, 0.5`, 0}|> *)
POSTED BY: Szabolcs Horvát
Posted 9 years ago

Hi Szabolcs,

Good news! In the near future, Wolfram Research is working on an updated CSV Import/Export which will fix a number of bugs with escaped characters, as well as provide speed/memory improvements via a LibraryLink paclet. We will consider the suggestions regarding column-wise data types for later releases.

Also, Have you seen the option "HeaderLines"? This will skip over header rows, which have things like a column name like you mentioned.

Thanks for the suggestions!

-S

POSTED BY: Sean Cheren
POSTED BY: Szabolcs Horvát
Posted 9 years ago

Thank you for the good news!

I think it is appropriate to mention here this long-standing bug of "TextDelimiters":

POSTED BY: Alexey Popkov

I'm with you Szabolcs!

I definitely prefer the "reliability" version. I have also been previously bitten by the "smart" interpretation of my data.

I would definitely prefer to customize the interpretation I want, eventually having as an option the full semantic interpretation..., but I guess that this is not what I would be choosing for most of my work.

Cheers,

POSTED BY: Pedro Fonseca
POSTED BY: Sander Huisman
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard