Group Abstract Group Abstract

Message Boards Message Boards

2
|
689 Views
|
11 Replies
|
16 Total Likes
View groups...
Share
Share this post:

What is the best way to save a Dataset?

I generated a large Dataset from several files. What would be the best way to save (or export) the new Dataset for future use?

11 Replies
Posted 14 days ago

Kind of lame that Mathematica can't save and re-read one of its basic data-types!

POSTED BY: Bernard Gress
Posted 14 days ago

There's no evidence given so far that says DumpSave doesn't work. Here's an example that does work.

populations = ExampleData[{"Dataset", "StatePopulations"}];
DumpSave["pop.mx", populations];

Now either quit the kernel or exit Mathematica and get back in so that you know that populations is not available. Then enter

Get["pop.mx"]

Now populations will be available.

POSTED BY: Drjbaldwin

You could try exporting to Parquet and/or SQLite. For SQLite, you may need to load DatabaseLink using Get or Needs. I’ve recently used both formats successfully when working with large datasets. SQLite has the advantage of being open source with decent free viewer. And, if you don’t need all the data, you can get a selected subset. Similar operations may also be possible with Parquet, but I’ve not tried.

POSTED BY: Ian Williams
Posted 27 days ago

The best/essential way includes also saving the meta data. I assume that you tried DumpSave to get the .mx format and that Get did not work to re-import the data.

POSTED BY: Jim Baldwin

Thank you Jim. I am testing with DumpSave, but I think it is the same problem as with Export to MX. Maybe the best is to convert the Dataset to a List and then save it as a list. I will continue testing.

Posted 27 days ago

Have a look at the function Iconize. It will label and persistently save its argument data in the notebook where its executed. And the icon label (with its underlying data) can be copied to other notebooks.

POSTED BY: Hans Milton

Thank you Hans. I have used Iconize with smaller sets of data, but for very large ones (more than 13 million records) it doesn't seem to perform very well.

Posted 1 month ago

Do you want it in a format that other applications can use? Or do you only need to import back into Mathematica? Or do you just want to avoid re-generating the Dataset again (avoid being dependent on those original files)?

POSTED BY: Eric Rimbey

Hi Eric. I just want to save them to import them back in Mathematica. I alreaqdy tried the *.MX format. It worked exporting (took a long time, though), but it didn't work when I tried to importe the data back to Mathematica. One Dataset has about 13 million records.

Hi Ricardo,

Have you tried to re-express the Dataset object using Tabular, new in WL 14.2 ? Tabular is usually much more efficient in storage, because it uses special ways to encode the different types of columns, say dates or strings. Even if your Dataset object is deeper than two levels, it can be expressed as a Tabular object. The important thing is that it is not very ragged.

Other than that, it may be more efficient to save the Dataset object in some compressed format, like MX or WDX.

Hi José Martín,
Nice to read you!
Thank you for your quick response. Unfortunately, I still have version 13.3. I will try the other options you mention.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard