Import
and Export
of HDF5 are implemented using a paclet called HDF5Tools. Advanced users in performance-critical applications can use paclet functions directly to write code that avoids the overhead of Import
and Export
and is often significantly faster. Below is a quick demo.
Before we move on, a short disclaimer: internal functionality may not be well documented and is not guaranteed to work in future versions of the Wolfram Language.
First, we explicitly load and initialize the paclet (normally, Import does it under the hood):
In[2]:= Needs["HDF5Tools`"]
In[3]:= HDF5ToolsInit[True]
Out[3]= True
Then we create a new HDF5 file called "test.h5" with one group called "MyGroup":
In[4]:= fileId = h5fcreate["test.h5", H5FACCTRUNC];
In[5]:= myGroup = h5gcreate[fileId, "MyGroup", H5PDEFAULT, H5PDEFAULT, H5PDEFAULT];
To write data to this file we need to create a dataset. We will call it "NewDataset". But first, a dataspace is needed:
In[6]:= dspace = h5screatesimplen[1, {10000000}];
In[7]:= dset =
h5dcreate[myGroup, "NewDataset", H5TNATIVEDOUBLE, dspace, H5PDEFAULT,
H5PDEFAULT, H5PDEFAULT];
Now we are ready to write data to the file:
In[8]:= A = RandomReal[{0, 1}, 10000000];
In[9]:= h5dwrite[dset, H5TNATIVEDOUBLE, H5SALL, H5SALL, H5PDEFAULT, A] // AbsoluteTiming
Out[9]= {0.0721832, 0}
Finally, we must manually close all created objects:
In[10]:= h5dclose@dset;
h5sclose@dspace;
h5gclose@myGroup;
h5fclose@fileId;
Reading the data could look like this: first we open the file
In[14]:= file = h5fopen["test.h5", H5FACCRDWR];
Then we open the dataset:
In[15]:= dset = h5dopen[file, "/MyGroup/NewDataset", H5PDEFAULT];
Now we can read the data from dataset to the Wolfram Language:
In[16]:= data = h5dread[dset, H5TNATIVEDOUBLE, H5SALL, H5SALL, H5PDEFAULT] // AbsoluteTiming
Out[16]= {0.0269582, NumericArray[< 10000000 >, Real64]}
The data is returned in an efficient form of NumericArray
. Finally, release resources:
In[17]:= h5dclose@dset;
h5fclose@file;
As you can see, the code requires certain knowledge of the HDF5 format and is much harder to write than a simple call to Import
or Export
but it's also noticeably faster.