Group Abstract Group Abstract

Message Boards Message Boards

0
|
13.6K Views
|
13 Replies
|
8 Total Likes
View groups...
Share
Share this post:

Falling back to Mathematica 11.1. Imports of CSV files are 100 times slower

Posted 8 years ago

Hi,

I originally posted a problem I was encountering using Mathematica 11.2 under: Mathematica 11.2 Import Issues

After implementing the recommended workaround that eliminated the Import CSV row truncation issue, I had to give up, and fall back to Mathematica 11.1. Working with over 100,000 rows of downloaded Wolfram StarData, I had a minimum of 11 files, with 10,000 rows per file. The import times under Mathematic 11.1 were around 2 seconds for each CSV file. Under Mathematica 11.2, Imports of the same files are over 250 seconds for each file, making using Mathematica11.2 to analyze StarData unworkable. Falling back to Mathematica 11.1 is my only solution until this problem is fixed.

See attached Import image file.

I've also included the sample CSV file.

enter image description here

Attachments:
POSTED BY: Joseph Karpinski
13 Replies

Good news!

Just heard back from Wolfram Research Technical Support Supervisor.

They are working on addressing some of the unexpected consequences from the CSV Import functionality update in Mathematica 11.2, and hope to have this resolved in a not-too-distant product update.

Mathematica is a great product.

Looking forward to exploring the new features and functionality in Mathematica 11.2.1 once my Mathematica notebooks are fully functional again.

Thank you!!!

POSTED BY: Joseph Karpinski
POSTED BY: Joseph Karpinski
Posted 8 years ago

Hi Joseph,

I don't have sufficient experience communicating with the Wolfram tech support team but from their replies cited in other questions on StackExchange I assume that the formulation

This particular issue has been filed as a report and our development group is working on solving it for a future release.

actually means that the tech support team accepted the issue as a bug. But it doesn't necessarily mean that the development team considers this issue as a bug: for example, in the case of this unrelated issue the support confirmed the bug at first but later stated that the development team considers new behavior as correct.


The default behavior should be, all existing Mathematica import/export CSV works as it always has, and newer Mathematica CSV functionality can be utilized by adding additional parameters.

I agree. At very least, they should add a documented Method option allowing correct import of CSV files generated by previous versions of Mathematica.

POSTED BY: Alexey Popkov

Hi Marco,

I guess looking at the emails between myself and Wolfram Technical Support that it was StackExchange that accepted it as a bug. Wolfram Technical Support: "This particular issue has been filed as a report and our development group is working on solving it for a future release. The introduction of this behavior comes from the reworking of CSV importing in Mathematica 11.2 and, as some of your posts mention, the workaround is to not specify {"Data", All} when all of the data needs to be imported."

I replied back that this was a broader issue, asked that it be raised to a higher supervisor, and copied Steven Wolfram on the email chain:

"If you export/import CSV data in Mathematica 11.2, no truncation of rows happens and the TextDelimiters parameter is not needed. But all CSV files exported in earlier releases of Mathematica, like release 11.1, are subject to hidden row truncation. So all user CSV files storing critical data are at risk. You can't tell 100,000s of existing Mathematica users, to update every program that imports CSV files created from previous Mathematica releases. You have to fix this. The default behavior should be, all existing Mathematica import/export CSV works as it always has, and newer Mathematica CSV functionality can be utilized by adding additional parameters. Not the other way around.

The same is true with the long elapsed time issue.

Please raise this to a higher supervisor.

You have to fix this.

Just some additional comments on this issue:

  1. You should halt all further downloads of Mathematica 11.2 until this issue is fixed in Mathematica 11.3

  2. This issue may apply to other file types, not just CSV files. You are going to have to test that.

  3. If you have a multimillion dollar financial or drug company that heavily relies on and uses Mathematica for simulations and financial forecasts, they should not rollout Mathematica 11.2 in their production and tests environments until this issue is fixed. You have no idea what hidden data truncation will have on simulations and financial data. And you can't take that chance, whether it's one file, one user, or hundreds of thousands.

  4. If I export 10,000 rows of data, I better get back 10,000 rows of data on import. Whether a data field has missing or questionable values is not the issue. Most coding will filter or throw out bad data. But an export/import should return all rows. "

POSTED BY: Joseph Karpinski

Hi Joseph,

did StackExchange accept it as a bug or did Wolfram Research also acknowledge a bug? (I understand that you contacted them?)

Best wishes,

Marco

POSTED BY: Marco Thiel

This issue is now recognized as a bug in Mathematica 11.2.

See the below link:

Bug Introduced In Mathematica 11.2

POSTED BY: Joseph Karpinski
POSTED BY: Marco Thiel
POSTED BY: Joseph Karpinski
POSTED BY: Joseph Karpinski

Hi Ivan, On your first reply, I've been using that version of the Import statement on CSV datasets for a while, across different versions of Mathematica with no issues, until Mathematica 11.2. There were two issues with the Import statement under that newer release. The first was that it truncated rows of data on an import without any notification, unless you were looking for it with a Length function. The"TextDelimiters"->"" fixed that problem in Mathematica 11.2, but it should of never occurred, and will impact other users, without them realizing it. The second issue is the 100 times increase in elapsed time. I've just opened a bug report with WolfRam.

POSTED BY: Joseph Karpinski

Hi Joseph,

what I wrote above made me believe that this could work:

datatest = Import["~/Desktop/allStarData4.csv", {"CSV"}, "TextDelimiters" -> "\r"];

and this runs in less than 0.41 seconds:

AbsoluteTiming[datatest = Import["~/Desktop/allStarData4.csv", {"CSV"}, "TextDelimiters" -> "\r"]; Length[datatest]]

gives:

{0.407668, 10001}

Cheers,

Marco

PS: It is probably related to this post.

POSTED BY: Marco Thiel

Dear Joseph,

I only post this because it shows how I identified the problem. The next post gives a potential solution:

I can, to some extend reproduce what you describe. It would have been useful to have your code in a code box to avoid typing it again:

AbsoluteTiming[Length[fix4 = Import["~/Desktop/allStarData4.csv", {"Data", All}, "HeaderLines" -> 1] /. 
Evaluate[ToExpression /@ Table["c" <> ToString[i] <> "_", {i, 1, 22}] -> ToExpression /@ Table["c" <> ToString[i], {i, 1, 22}]]]]

Note that the two examples you show in your post are slightly different: the first one contains the

"TextDelimiters" -> ""

option which gives an error in MMA 11.1.1 on my machine. Here are my results:

enter image description here

and

enter image description here

On 11.2 it give an error message and takes about 30 times longer to load. It also has only half of the entries. On MMA11.2 this here

AbsoluteTiming[Length[fix4 = Import["~/Desktop/allStarData4.csv", {"Data", All}, "HeaderLines" -> 1]]]

gives the same results as the upper command, but in less than 28 seconds; and it also only leads to 5002 rows.

Your "fixed" MMA 11.2 code:

AbsoluteTiming[Length[fix4 = Import["~/Desktop/allStarData4.csv", {"Data", All}, "HeaderLines" -> 1, "TextDelimiters" -> ""] /. 
Evaluate[ToExpression /@ Table["c" <> ToString[i] <> "_", {i, 1, 22}] -> ToExpression /@ Table["c" <> ToString[i], {i, 1, 22}]]]]

gives an error message and runs for 336 seconds.

enter image description here

The Import probably takes long because it is gigantic:

fix4[[1]]

gives

enter image description here

and

fix4 // ByteCount

gives 77724720080.

By the way, does exporting the data work for you in MMA11.2 directly?

Export["~/Desktop/allStarData412.csv", fix4]

takes excessively long on my machine, probably because it is enormous. In fact, I interrupted it after about 20 minutes without success.Can anyone check whether this works on their machines?

I also noted that SemanticImport does success in importing the file and is relatively fast:

AbsoluteTiming[Length[fix5 = SemanticImport["~/Desktop/allStarData4.csv", "HeaderLines" -> 1]]]

gives

enter image description here

The HeaderLine option doesn't make any difference so you can delete it in this case. If you look at the output you can recover your data, but it misinterpreted the header.

enter image description here

Best wishes,

Marco

POSTED BY: Marco Thiel

Hi, I' m on 11.2 with Windows 10. When I Import the file (standard Import) the data seems ragged. One column was missing data here and there. Import went fast (0.5 sec). I opened the file with Excel and saw some "strange" field like carbon‐oxygen white dwarf I saved the file in MSDOS CSV and reimported. Now the file seemed to read in: enter image description here Still don't know if it's like expected.

POSTED BY: l van Veen
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard