Message Boards Message Boards

3
|
8169 Views
|
5 Replies
|
9 Total Likes
View groups...
Share
Share this post:

How I saved Gigabytes of memory or the Discovery of FAT and thin lists

Posted 10 years ago

I've been running Mathematica with large amounts of data. ByteCounts for lists can be over a gigabyte. Sometimes seemingly small software changes could put Mathematica into a virtual memory fit leading to system lockup. (A related problem is that Mathematica can get into a VM fit just to DumpSave a large list whereas it can Put ">>" the same list successfully.)

The problem is Mathematica has at least two ways of storing lists with one method taking 3 times the storage of the other. Here is code to demonstrate the problem (scaled down). We'll start with a simple list of lists.

myList = Table[Table[RandomReal[], {1000000}], {10}];
ByteCount /@ myList
ByteCount@myList

Output

{8000144, 8000144, 8000144, 8000144, 8000144, 8000144, 8000144, 8000144, 8000144, 8000144}

80001560

Then we Flatten the list or Join the lists to produce a simple result. The ByteCounts are different for each function.

myFlatList = Flatten@myList;
ByteCount@myFlatList

240000080

myJoinList = Join @@ myList;
ByteCount@myJoinList

80000144

But the lists are equal and the same! In memory usage one is FAT and the other is thin.

Equal[myJoinList, myFlatList] && SameQ[myJoinList, myFlatList]

True

The difference extends to saving the lists as MX.

SetDirectory["D:\\"];
DumpSave["flatlist.mx", myFlatList];
DumpSave["joinlist.mx", myJoinList];
FileByteCount@"flatlist.mx"
FileByteCount@"joinlist.mx"

108640546

80000252

When the MX files are read back the FAT and thinness is preserved.

<< "flatlist.mx";
<< "joinlist.mx";
ByteCount@myFlatList
ByteCount@myJoinList

240000080

80000144

What about text output?

Put[myFlatList, "flatlist.txt"];
Put[myJoinList, "joinlist.txt"];
FileByteCount@"flatlist.txt"
FileByteCount@"joinlist.txt"

125000000

123333334

A slight difference in size. Mathematica randomly changes the number of values per line. The good news is the values are correct, the bad news is that text-derived lists are always FAT.

join2 = << "joinlist.txt";
flat2 = << "flatlist.txt";
join2 == flat2
ByteCount@flat2
ByteCount@join2

True

240000080

240000080

Operations can make FAT lists or thin lists. The simplest way (so far) of thinning a list (by 2/3) is this:

newFlat = Map[Identity, myFlatList];
ByteCount@newFlat

80000144

Mathematica 9 has the same results.

POSTED BY: Douglas Kubler
5 Replies

Yes, Map uses autocompilation for lists over a certain size and they will be packed if possible. The direct way would be

packed = Developer`ToPackedArray[unpacked];
POSTED BY: Ilian Gachevski

myList is not a packed array and I guessFlatten therefore decides the result should be unpacked. Had the thing been packed to begin with there would be no unpacking by Flatten. Try it e.g. with

myList2 = RandomReal[1, {1000000, 10}];

As for Join, it is in fact seeing packed arrays since the component sublists of myList are packed. Ergo, a packed result.

POSTED BY: Daniel Lichtblau
Posted 10 years ago

And to be different Map decides the result should be packed. I'll be using this.

packed = Map[Identity, unpacked];
POSTED BY: Douglas Kubler

Very interesting! I'm interested to know the cause of this. I sometimes also work with very big files...

POSTED BY: Sander Huisman
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract