I have a question about the size of Dataset in RAM.
I have noticed that if i have a file .mx that contain a Dataset and the size on the HDD is for example 50 MB, when i import it in Mathematica the size in the RAM is about 10 times (500 MB)!
Do you know why there is so many difference between the two sizes? Is it normal?
I have noticed that if i save the same data as List and not as Dataset (without association) the size is just greater than the size in HDD.
Is there any solution that allows using the dataset without this "waste" of memory", because it is a big problem when i have to deal with a large amount of data... but the datasets are more "comfortable" than pure List.
Thank you for your help!
I'd like to add another fact...
i've have seen a similar problem also with lists:
If i create a list of pairs of numbers with Table, during the computation, the RAM occupation grow from "zero" to about 750 MB; when i export the file (in .mx format) the occupation of RAM does not increase (as i aspect...) and the output file is about 500 MB.
This looks good.
Now, i have a text file with a pair of numbers in each row and i import it line by line with ReadList.
The text file size is about 1 GB. When it finishes to import it the RAM occupation is about 3 GB!
That's not all, because if i export it, during the export process, the RAM occupation increase to about 15 GB, and when it finishes exporting it the RAM occupation is about 4 GB...
That is strange...
If i do a Quit and reimport the data, the RAM occupation is again 3 times the size of data.
If i apply the ByteCount function it give me the same size of RAM.
But if i apply the ByteCount to one element of each list, the size of the element created by the Table is a little smaller than the other, but when i take a bub-list of more than two elements the size of the Table list is much smaller than the other...
Another thing... If i apply FullForm, the two list have the same structure and the function ByteCount to each number of the two lists give the same size (16 byte).
Have you an idea what could be the difference between the two cases? How can i solve this issue?
Its likely your history. Mathematica stores all previous results (i.e. ...Out, Out, Out...) Your intermediate text results are very large. Turn off History with
$HistoryLength = 0;
or if you want to use % to refer to the previous result and not store it in a variable.
$HistoryLength = 1;
Now you may also have to clear old variables you no longer need to free up the memory.
You can see information about this in Memory Measurement and Optimization
It can't be the history, because i see this "anomalous" usage of memory as i import the data.
Furthermore, if i apply the function ByteCount to the lists, it gives me two different results, as i wrote in the previous post, and it doesn't depend on the history...
There is also the usage of RAM during export that is different between the two cases.
If you want, i can attach an example file to explain better the problem...