Group Abstract Group Abstract

Message Boards Message Boards

2
|
1.2K Views
|
12 Replies
|
19 Total Likes
View groups...
Share
Share this post:

What is the best way to save a Dataset?

I generated a large Dataset from several files. What would be the best way to save (or export) the new Dataset for future use?

12 Replies

Hi Ricardo,

Have you tried to re-express the Dataset object using Tabular, new in WL 14.2 ? Tabular is usually much more efficient in storage, because it uses special ways to encode the different types of columns, say dates or strings. Even if your Dataset object is deeper than two levels, it can be expressed as a Tabular object. The important thing is that it is not very ragged.

Other than that, it may be more efficient to save the Dataset object in some compressed format, like MX or WDX.

Posted 2 months ago

The best/essential way includes also saving the meta data. I assume that you tried DumpSave to get the .mx format and that Get did not work to re-import the data.

POSTED BY: Jim Baldwin

Hi José Martin,

I now have version 14.2.

Tabular works very well.
Tabular is an excellent addition to Wolfram!!

Posted 1 month ago

There's no evidence given so far that says DumpSave doesn't work. Here's an example that does work.

populations = ExampleData[{"Dataset", "StatePopulations"}];
DumpSave["pop.mx", populations];

Now either quit the kernel or exit Mathematica and get back in so that you know that populations is not available. Then enter

Get["pop.mx"]

Now populations will be available.

POSTED BY: Jim Baldwin

You could try exporting to Parquet and/or SQLite. For SQLite, you may need to load DatabaseLink using Get or Needs. I’ve recently used both formats successfully when working with large datasets. SQLite has the advantage of being open source with decent free viewer. And, if you don’t need all the data, you can get a selected subset. Similar operations may also be possible with Parquet, but I’ve not tried.

POSTED BY: Ian Williams
Posted 2 months ago

Have a look at the function Iconize. It will label and persistently save its argument data in the notebook where its executed. And the icon label (with its underlying data) can be copied to other notebooks.

POSTED BY: Hans Milton

Hi Eric. I just want to save them to import them back in Mathematica. I alreaqdy tried the *.MX format. It worked exporting (took a long time, though), but it didn't work when I tried to importe the data back to Mathematica. One Dataset has about 13 million records.

Hi José Martín,
Nice to read you!
Thank you for your quick response. Unfortunately, I still have version 13.3. I will try the other options you mention.

Posted 1 month ago
POSTED BY: Bernard Gress

Thank you Jim. I am testing with DumpSave, but I think it is the same problem as with Export to MX. Maybe the best is to convert the Dataset to a List and then save it as a list. I will continue testing.

Thank you Hans. I have used Iconize with smaller sets of data, but for very large ones (more than 13 million records) it doesn't seem to perform very well.

Posted 2 months ago
POSTED BY: Eric Rimbey
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard