Group Abstract Group Abstract

Message Boards Message Boards

0
|
8.3K Views
|
9 Replies
|
5 Total Likes
View groups...
Share
Share this post:

Is WDX a good export format for non-standard data?

Posted 10 years ago

Hi, I'm extracting data from many text files using

ImportString[…]

Some of the results will come back as strings, some as lists, some lists of lists, etc. I want to export each extraction in a format that will import back into Mathematica exactly as it was exported, so that I can process it later. Is WDX the best way to go?

Gregory

POSTED BY: Gregory Lypny
9 Replies
Posted 10 years ago

Thanks for the heads up. Will do. So many formats, so little time.

POSTED BY: Gregory Lypny

Also have a look at the DumpSave command...

POSTED BY: Sander Huisman
Posted 10 years ago

Thanks everyone,

Helpful comments. My needs are simple, the main one being that I want to be able to import the Mathematica expressions (lists, associations, etc.) exactly as I have exported them, so that I can continue analyzing the data. Speed is not a big deal because, while I am analyzing over 100,000 files, the stuff that I will be exporting from each is small. Compatibility with future releases of Mathematica is important, of course; so if Wolfram is thinking of dropping WDX, then maybe I should learn more about MX.

Gregory

POSTED BY: Gregory Lypny

I do not like WDX. It tends to be very slow for even moderately large data. Fortunately there are alternatives.

I am aware of four ways to save arbitrary Mathematica expressions:

  • Save as InputForm: Export[..., "Package"]. Disadvantages: it takes a lot of space and it is slow. Advantages: cross-platform, cross-version.

  • Save as WDX: Export[..., "WDX"]. It doesn't take as much space, but it is still slow.

  • Save as MX: Export[..., "MX"]. This is very fast to import/export. Space efficiency is not excellent, but can be improved by gzip-ping the result: Export["file.mz.gz", data]. Before version 9, this was neither cross platform, nor cross version, so usually we could only re-import on the same machine where the data was exported, and only with the same Mathematica version. Since version 10, compatibility has been improved: the files are not compatible between different operating systems (OS X, Windows, Linux), but not between 32-bit and 64-bit versions of Mathematica. There is also backward-compatibility: files written by 10.0 are still readable by 10.3 (but not the reverse).

  • My preferred choice: use Compress. I use

        zimport[filename_] := Uncompress@Import[filename, "String"]
        zexport[filename_, data_] := Export[filename, Compress[data], "String"]
    

    This method can be much faster than WDX, and it is both cross-platform and cross-version compatible in practice. But Wolfram did not make any guarantees about compatibility (it is not mentioned at all in the documentation), so I would not consider it safe for archival. (Interestingly, there are no comments in the documentation about WDX compatibility either!)

To summarize:

  • If you need to archive data, then the best way is to use a format that is not Mathematica-specific. If it needs to be Mathematica expressions, the best way is to archive them is textual InputForm, as it is the most compatible.

  • If you need to save data for immediate re-use on the same machine, for example for continuing your work tomorrow, use MX.

  • If you need to move data between multiple potentially incompatible platforms (e.g. you are using 10.3 on a laptop and 10.2 on a server), and the data is used in the short term only (no archival), then use Compress.

I don't see a situation where WDX is really the best format.

Further reading:

POSTED BY: Szabolcs Horvát
POSTED BY: Sander Huisman
Posted 10 years ago

Quting Taliesin Beynon,

WDX is not a good format. We are likely to deprecate it or entirely replace it with a different implementation that is not backward-compatible (which is obviously problematic). Dataset will never directly support it.

There are candidates for a possible native format for dataset, like Cap'n Proto, HDF5, and a couple others. Or XML. No, just kidding :-)

So probably WDX isn't the best choice. I myself always use simple M format. If you need speed, MX format is recommended.

POSTED BY: Alexey Popkov
Posted 10 years ago
POSTED BY: Alexey Popkov
Posted 10 years ago

Thanks, Sander. Much obliged. I messed around with a couple hundred files and it seems to work well.

Gregory

POSTED BY: Gregory Lypny

Yes WDX or MX format, MX being only compatible with the machine you run it on. WDX cross-compatible.

POSTED BY: Sander Huisman
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard