Message Boards Message Boards

3
|
7465 Views
|
5 Replies
|
9 Total Likes
View groups...
Share
Share this post:

How I saved Gigabytes of memory or the Discovery of FAT and thin lists

Posted 9 years ago

I've been running Mathematica with large amounts of data. ByteCounts for lists can be over a gigabyte. Sometimes seemingly small software changes could put Mathematica into a virtual memory fit leading to system lockup. (A related problem is that Mathematica can get into a VM fit just to DumpSave a large list whereas it can Put ">>" the same list successfully.)

The problem is Mathematica has at least two ways of storing lists with one method taking 3 times the storage of the other. Here is code to demonstrate the problem (scaled down). We'll start with a simple list of lists.

myList = Table[Table[RandomReal[], {1000000}], {10}];
ByteCount /@ myList
ByteCount@myList

Output

{8000144, 8000144, 8000144, 8000144, 8000144, 8000144, 8000144, 8000144, 8000144, 8000144}

80001560

Then we Flatten the list or Join the lists to produce a simple result. The ByteCounts are different for each function.

myFlatList = Flatten@myList;
ByteCount@myFlatList

240000080

myJoinList = Join @@ myList;
ByteCount@myJoinList

80000144

But the lists are equal and the same! In memory usage one is FAT and the other is thin.

Equal[myJoinList, myFlatList] && SameQ[myJoinList, myFlatList]

True

The difference extends to saving the lists as MX.

SetDirectory["D:\\"];
DumpSave["flatlist.mx", myFlatList];
DumpSave["joinlist.mx", myJoinList];
FileByteCount@"flatlist.mx"
FileByteCount@"joinlist.mx"

108640546

80000252

When the MX files are read back the FAT and thinness is preserved.

<< "flatlist.mx";
<< "joinlist.mx";
ByteCount@myFlatList
ByteCount@myJoinList

240000080

80000144

What about text output?

Put[myFlatList, "flatlist.txt"];
Put[myJoinList, "joinlist.txt"];
FileByteCount@"flatlist.txt"
FileByteCount@"joinlist.txt"

125000000

123333334

A slight difference in size. Mathematica randomly changes the number of values per line. The good news is the values are correct, the bad news is that text-derived lists are always FAT.

join2 = << "joinlist.txt";
flat2 = << "flatlist.txt";
join2 == flat2
ByteCount@flat2
ByteCount@join2

True

240000080

240000080

Operations can make FAT lists or thin lists. The simplest way (so far) of thinning a list (by 2/3) is this:

newFlat = Map[Identity, myFlatList];
ByteCount@newFlat

80000144

Mathematica 9 has the same results.

POSTED BY: Douglas Kubler
5 Replies

Yes, Map uses autocompilation for lists over a certain size and they will be packed if possible. The direct way would be

packed = Developer`ToPackedArray[unpacked];
POSTED BY: Ilian Gachevski

myList is not a packed array and I guessFlatten therefore decides the result should be unpacked. Had the thing been packed to begin with there would be no unpacking by Flatten. Try it e.g. with

myList2 = RandomReal[1, {1000000, 10}];

As for Join, it is in fact seeing packed arrays since the component sublists of myList are packed. Ergo, a packed result.

POSTED BY: Daniel Lichtblau
Posted 9 years ago

And to be different Map decides the result should be packed. I'll be using this.

packed = Map[Identity, unpacked];
POSTED BY: Douglas Kubler

Very interesting! I'm interested to know the cause of this. I sometimes also work with very big files...

POSTED BY: Sander Huisman
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract