Message Boards Message Boards

0
|
6859 Views
|
9 Replies
|
5 Total Likes
View groups...
Share
Share this post:

Is WDX a good export format for non-standard data?

Posted 8 years ago

Hi, I'm extracting data from many text files using

ImportString[…]

Some of the results will come back as strings, some as lists, some lists of lists, etc. I want to export each extraction in a format that will import back into Mathematica exactly as it was exported, so that I can process it later. Is WDX the best way to go?

Gregory

POSTED BY: Gregory Lypny
9 Replies
Posted 8 years ago

Thanks for the heads up. Will do. So many formats, so little time.

POSTED BY: Gregory Lypny
Posted 8 years ago

Thanks everyone,

Helpful comments. My needs are simple, the main one being that I want to be able to import the Mathematica expressions (lists, associations, etc.) exactly as I have exported them, so that I can continue analyzing the data. Speed is not a big deal because, while I am analyzing over 100,000 files, the stuff that I will be exporting from each is small. Compatibility with future releases of Mathematica is important, of course; so if Wolfram is thinking of dropping WDX, then maybe I should learn more about MX.

Gregory

POSTED BY: Gregory Lypny

Also have a look at the DumpSave command...

POSTED BY: Sander Huisman

I do not like WDX. It tends to be very slow for even moderately large data. Fortunately there are alternatives.

I am aware of four ways to save arbitrary Mathematica expressions:

  • Save as InputForm: Export[..., "Package"]. Disadvantages: it takes a lot of space and it is slow. Advantages: cross-platform, cross-version.

  • Save as WDX: Export[..., "WDX"]. It doesn't take as much space, but it is still slow.

  • Save as MX: Export[..., "MX"]. This is very fast to import/export. Space efficiency is not excellent, but can be improved by gzip-ping the result: Export["file.mz.gz", data]. Before version 9, this was neither cross platform, nor cross version, so usually we could only re-import on the same machine where the data was exported, and only with the same Mathematica version. Since version 10, compatibility has been improved: the files are not compatible between different operating systems (OS X, Windows, Linux), but not between 32-bit and 64-bit versions of Mathematica. There is also backward-compatibility: files written by 10.0 are still readable by 10.3 (but not the reverse).

  • My preferred choice: use Compress. I use

        zimport[filename_] := Uncompress@Import[filename, "String"]
        zexport[filename_, data_] := Export[filename, Compress[data], "String"]
    

    This method can be much faster than WDX, and it is both cross-platform and cross-version compatible in practice. But Wolfram did not make any guarantees about compatibility (it is not mentioned at all in the documentation), so I would not consider it safe for archival. (Interestingly, there are no comments in the documentation about WDX compatibility either!)

To summarize:

  • If you need to archive data, then the best way is to use a format that is not Mathematica-specific. If it needs to be Mathematica expressions, the best way is to archive them is textual InputForm, as it is the most compatible.

  • If you need to save data for immediate re-use on the same machine, for example for continuing your work tomorrow, use MX.

  • If you need to move data between multiple potentially incompatible platforms (e.g. you are using 10.3 on a laptop and 10.2 on a server), and the data is used in the short term only (no archival), then use Compress.

I don't see a situation where WDX is really the best format.

Further reading:

POSTED BY: Szabolcs Horvát
Posted 8 years ago

Quting Taliesin Beynon,

WDX is not a good format. We are likely to deprecate it or entirely replace it with a different implementation that is not backward-compatible (which is obviously problematic). Dataset will never directly support it.

There are candidates for a possible native format for dataset, like Cap'n Proto, HDF5, and a couple others. Or XML. No, just kidding :-)

So probably WDX isn't the best choice. I myself always use simple M format. If you need speed, MX format is recommended.

POSTED BY: Alexey Popkov
Posted 8 years ago

Thanks, Sander. Much obliged. I messed around with a couple hundred files and it seems to work well.

Gregory

POSTED BY: Gregory Lypny

It, of course, depends on what you want to do with it. If you want maximum compatibility with other pc or different software? Or just storage of 'arbitrary stuff' for yourself? mx is very fast but is limited to Mathematica (it is basically a memory dump). If your data has ragged array and many nested arrays, many file-formats can not be used. Or if it combines numbers, symbols, images, in lists, only very few file-formats might be usable...

POSTED BY: Sander Huisman

Yes WDX or MX format, MX being only compatible with the machine you run it on. WDX cross-compatible.

POSTED BY: Sander Huisman
Posted 8 years ago

Actually according to Leonid Shifrin since version 10 MX files became de facto cross-platform:

Since version 10, MX files became de facto cross-platform (although not cross-architecture, so 32-bit and 64-bit mx files are not compatible with each other).

POSTED BY: Alexey Popkov
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract