Message Boards Message Boards

0
|
10975 Views
|
13 Replies
|
8 Total Likes
View groups...
Share
Share this post:

Falling back to Mathematica 11.1. Imports of CSV files are 100 times slower

Posted 7 years ago

Hi,

I originally posted a problem I was encountering using Mathematica 11.2 under: Mathematica 11.2 Import Issues

After implementing the recommended workaround that eliminated the Import CSV row truncation issue, I had to give up, and fall back to Mathematica 11.1. Working with over 100,000 rows of downloaded Wolfram StarData, I had a minimum of 11 files, with 10,000 rows per file. The import times under Mathematic 11.1 were around 2 seconds for each CSV file. Under Mathematica 11.2, Imports of the same files are over 250 seconds for each file, making using Mathematica11.2 to analyze StarData unworkable. Falling back to Mathematica 11.1 is my only solution until this problem is fixed.

See attached Import image file.

I've also included the sample CSV file.

enter image description here

Attachments:
POSTED BY: Joseph Karpinski
13 Replies

Good news!

Just heard back from Wolfram Research Technical Support Supervisor.

They are working on addressing some of the unexpected consequences from the CSV Import functionality update in Mathematica 11.2, and hope to have this resolved in a not-too-distant product update.

Mathematica is a great product.

Looking forward to exploring the new features and functionality in Mathematica 11.2.1 once my Mathematica notebooks are fully functional again.

Thank you!!!

POSTED BY: Joseph Karpinski

This issue is now recognized as a bug in Mathematica 11.2.

See the below link:

Bug Introduced In Mathematica 11.2

POSTED BY: Joseph Karpinski

Hi Joseph,

did StackExchange accept it as a bug or did Wolfram Research also acknowledge a bug? (I understand that you contacted them?)

Best wishes,

Marco

POSTED BY: Marco Thiel

Hi Marco,

I guess looking at the emails between myself and Wolfram Technical Support that it was StackExchange that accepted it as a bug. Wolfram Technical Support: "This particular issue has been filed as a report and our development group is working on solving it for a future release. The introduction of this behavior comes from the reworking of CSV importing in Mathematica 11.2 and, as some of your posts mention, the workaround is to not specify {"Data", All} when all of the data needs to be imported."

I replied back that this was a broader issue, asked that it be raised to a higher supervisor, and copied Steven Wolfram on the email chain:

"If you export/import CSV data in Mathematica 11.2, no truncation of rows happens and the TextDelimiters parameter is not needed. But all CSV files exported in earlier releases of Mathematica, like release 11.1, are subject to hidden row truncation. So all user CSV files storing critical data are at risk. You can't tell 100,000s of existing Mathematica users, to update every program that imports CSV files created from previous Mathematica releases. You have to fix this. The default behavior should be, all existing Mathematica import/export CSV works as it always has, and newer Mathematica CSV functionality can be utilized by adding additional parameters. Not the other way around.

The same is true with the long elapsed time issue.

Please raise this to a higher supervisor.

You have to fix this.

Just some additional comments on this issue:

  1. You should halt all further downloads of Mathematica 11.2 until this issue is fixed in Mathematica 11.3

  2. This issue may apply to other file types, not just CSV files. You are going to have to test that.

  3. If you have a multimillion dollar financial or drug company that heavily relies on and uses Mathematica for simulations and financial forecasts, they should not rollout Mathematica 11.2 in their production and tests environments until this issue is fixed. You have no idea what hidden data truncation will have on simulations and financial data. And you can't take that chance, whether it's one file, one user, or hundreds of thousands.

  4. If I export 10,000 rows of data, I better get back 10,000 rows of data on import. Whether a data field has missing or questionable values is not the issue. Most coding will filter or throw out bad data. But an export/import should return all rows. "

POSTED BY: Joseph Karpinski
Posted 7 years ago

Hi Joseph,

I don't have sufficient experience communicating with the Wolfram tech support team but from their replies cited in other questions on StackExchange I assume that the formulation

This particular issue has been filed as a report and our development group is working on solving it for a future release.

actually means that the tech support team accepted the issue as a bug. But it doesn't necessarily mean that the development team considers this issue as a bug: for example, in the case of this unrelated issue the support confirmed the bug at first but later stated that the development team considers new behavior as correct.


The default behavior should be, all existing Mathematica import/export CSV works as it always has, and newer Mathematica CSV functionality can be utilized by adding additional parameters.

I agree. At very least, they should add a documented Method option allowing correct import of CSV files generated by previous versions of Mathematica.

POSTED BY: Alexey Popkov

Hi Alexey,

I'm sort of at a loss at what more we can do.

As pointed out in a number of posts, there are things that a user can do to minimize the impact of importing/exporting data between releases of Mathematica. But all that responsibility falls on the user.

But I think Wolfram Technical Support does not fully understand how such a small issue of importing data incorrectly can lead to all sorts of mysterious user application problems. If you data is inconsistent, your results are going to be inconsistent. Data integrity is a basic requirement. This huge. All other issues of new functionality and features are secondary.

Take a simple example of storm outages from the last hurricane in Florida. Suppose you upgrade to Mathematica 11.2 to take advantage of new features and functionality of dealing with huge amounts of storm data. And a lot of your data is stored in temporary CSV file extracts. And then unknown to you, CSV imports of customers outage data, truncate or drops rows of customer outage data. So in your post processing, it looks like you've made great progress in power restoration, when customers are still reporting little or no progress.

Importing/Exporting data consistently is something everyone assumes every product does, correctly 100% of the time. When this showed up after a few days of the general release of Mathematica 11.2, they should of fixed it immediately. The fact that it is still not fixed, well ... Leaving it to the Mathematica user community to encounter a problem, and then search Mathematica support blogs for help, is like a car company leaving it to the dealers to fix know problems with their new cars. It works, but it makes everyone unhappy.

Still hoping that Wolfram will act, but I haven't heard anything from them since my last two emails on the broader issue of data integrity.

POSTED BY: Joseph Karpinski

Hi Joseph,

what I wrote above made me believe that this could work:

datatest = Import["~/Desktop/allStarData4.csv", {"CSV"}, "TextDelimiters" -> "\r"];

and this runs in less than 0.41 seconds:

AbsoluteTiming[datatest = Import["~/Desktop/allStarData4.csv", {"CSV"}, "TextDelimiters" -> "\r"]; Length[datatest]]

gives:

{0.407668, 10001}

Cheers,

Marco

PS: It is probably related to this post.

POSTED BY: Marco Thiel

Hi again Marco! Now this statement looks interesting. I will have to try it! The thing is, I still want that conversation with Wolfram technical support. Without that "TextDelimiters"->"" parameter, CSV file Imports under Mathematica 11.2 are loosing/truncating data. Without anyone being aware of it. That should never, never happen. Even if there is a parameter fix, like "TextDelimiters"->"" available, it should not be left to the hundreds of thousands of users of Mathematica, to discover the problem, and fix every CSV Import statement across every program they have. No, this needs to be fixed by Wolfram ASAP, before many, many users run into the same issue. And while they are at it, Wolfram should also fix the Import elapsed time problem, again, not leaving it to their user base to modify all their CSV Import code. I've done my part and opened a problem record on this with Wolfram. Waiting for their call.... Best Regards!!!

POSTED BY: Joseph Karpinski

Dear Joseph

,I think that I did not explain that well enough. The post I linked to says that there was a known issue with the way import read CSV files. Let's assume that also Export did not fully conform to the CSV standard.

Apparently, this has been fixed in version 11.2.; that comes at a little cost. The fiscal files before MMA 11.2 show that issue and therefore are not being read directly in MMA 11.2.

That means that you are probably just seeing the consequences of the fix and not a (new) bug. This is consistent with the observation that you can export and then read the data in MMA 11.2.

The post that I linked to above shows that there was an issue with the return "\r" in the older versions. ( I only found that post after I had the solution, because I knew what to look for.) What you see is only a consequence. But I agree that other users could be hit by the same issue.

If I am right, it is. it that Wolfram will have to fix this: they just have!

Best wishes,

Marco

POSTED BY: Marco Thiel

Dear Joseph,

I only post this because it shows how I identified the problem. The next post gives a potential solution:

I can, to some extend reproduce what you describe. It would have been useful to have your code in a code box to avoid typing it again:

AbsoluteTiming[Length[fix4 = Import["~/Desktop/allStarData4.csv", {"Data", All}, "HeaderLines" -> 1] /. 
Evaluate[ToExpression /@ Table["c" <> ToString[i] <> "_", {i, 1, 22}] -> ToExpression /@ Table["c" <> ToString[i], {i, 1, 22}]]]]

Note that the two examples you show in your post are slightly different: the first one contains the

"TextDelimiters" -> ""

option which gives an error in MMA 11.1.1 on my machine. Here are my results:

enter image description here

and

enter image description here

On 11.2 it give an error message and takes about 30 times longer to load. It also has only half of the entries. On MMA11.2 this here

AbsoluteTiming[Length[fix4 = Import["~/Desktop/allStarData4.csv", {"Data", All}, "HeaderLines" -> 1]]]

gives the same results as the upper command, but in less than 28 seconds; and it also only leads to 5002 rows.

Your "fixed" MMA 11.2 code:

AbsoluteTiming[Length[fix4 = Import["~/Desktop/allStarData4.csv", {"Data", All}, "HeaderLines" -> 1, "TextDelimiters" -> ""] /. 
Evaluate[ToExpression /@ Table["c" <> ToString[i] <> "_", {i, 1, 22}] -> ToExpression /@ Table["c" <> ToString[i], {i, 1, 22}]]]]

gives an error message and runs for 336 seconds.

enter image description here

The Import probably takes long because it is gigantic:

fix4[[1]]

gives

enter image description here

and

fix4 // ByteCount

gives 77724720080.

By the way, does exporting the data work for you in MMA11.2 directly?

Export["~/Desktop/allStarData412.csv", fix4]

takes excessively long on my machine, probably because it is enormous. In fact, I interrupted it after about 20 minutes without success.Can anyone check whether this works on their machines?

I also noted that SemanticImport does success in importing the file and is relatively fast:

AbsoluteTiming[Length[fix5 = SemanticImport["~/Desktop/allStarData4.csv", "HeaderLines" -> 1]]]

gives

enter image description here

The HeaderLine option doesn't make any difference so you can delete it in this case. If you look at the output you can recover your data, but it misinterpreted the header.

enter image description here

Best wishes,

Marco

POSTED BY: Marco Thiel

Hi Marco, Thanks for replying. Please see the link at the top of this posting for the original posting, with code available. I've been using downloads of 100,000 row WolfRam StarData for a while now, with the same set of CSV Import and Export statements, across a number of different release of Mathematica without issue, until Mathematica 11.2. For 11 CSV files, of 10,000 rows each, at 1 MB per file, the import times were around 2 seconds per file. Without any change of the Import statements, the file Imports under Mathematica 11.2 now take over 250 seconds per file. And, unless you were looking for it with a Length function, rows were truncated on the imports, by many thousands, on each Import. A Mathematica support blog recommended trying the "TextDelimiters"->"" parameter on Imports under Mathematica 11.2 to fix the truncation issues. That worked. But as the image at top of the page image shows, the Import under Mathematica 11.2 with the "TextDelimiters"->"" parameter ran in 250 seconds, while the same file imported under Mathematica 11.1 ran in 2 seconds. I've opened a bug report with WolfRam on this. This should of never occurred. Other users without realizing it, may be impacted by the truncation issue. And CSV import times of 250 seconds on a 1 MB file are unheard of. It needs to be fixed by WolfRam, before it impacts other users of Mathematica. Thanks again!!!

POSTED BY: Joseph Karpinski

Hi, I' m on 11.2 with Windows 10. When I Import the file (standard Import) the data seems ragged. One column was missing data here and there. Import went fast (0.5 sec). I opened the file with Excel and saw some "strange" field like carbon‐oxygen white dwarf I saved the file in MSDOS CSV and reimported. Now the file seemed to read in: enter image description here Still don't know if it's like expected.

POSTED BY: l van Veen

Hi Ivan, On your first reply, I've been using that version of the Import statement on CSV datasets for a while, across different versions of Mathematica with no issues, until Mathematica 11.2. There were two issues with the Import statement under that newer release. The first was that it truncated rows of data on an import without any notification, unless you were looking for it with a Length function. The"TextDelimiters"->"" fixed that problem in Mathematica 11.2, but it should of never occurred, and will impact other users, without them realizing it. The second issue is the 100 times increase in elapsed time. I've just opened a bug report with WolfRam.

POSTED BY: Joseph Karpinski
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract