3
|
7514 Views
|
5 Replies
|
9 Total Likes
View groups...
Share
GROUPS:

# How I saved Gigabytes of memory or the Discovery of FAT and thin lists

Posted 10 years ago
 I've been running Mathematica with large amounts of data. ByteCounts for lists can be over a gigabyte. Sometimes seemingly small software changes could put Mathematica into a virtual memory fit leading to system lockup. (A related problem is that Mathematica can get into a VM fit just to DumpSave a large list whereas it can Put ">>" the same list successfully.) The problem is Mathematica has at least two ways of storing lists with one method taking 3 times the storage of the other. Here is code to demonstrate the problem (scaled down). We'll start with a simple list of lists. myList = Table[Table[RandomReal[], {1000000}], {10}]; ByteCount /@ myList ByteCount@myList  Output {8000144, 8000144, 8000144, 8000144, 8000144, 8000144, 8000144, 8000144, 8000144, 8000144} 80001560 Then we Flatten the list or Join the lists to produce a simple result. The ByteCounts are different for each function. myFlatList = Flatten@myList; ByteCount@myFlatList  240000080 myJoinList = Join @@ myList; ByteCount@myJoinList  80000144 But the lists are equal and the same! In memory usage one is FAT and the other is thin. Equal[myJoinList, myFlatList] && SameQ[myJoinList, myFlatList]  True The difference extends to saving the lists as MX. SetDirectory["D:\\"]; DumpSave["flatlist.mx", myFlatList]; DumpSave["joinlist.mx", myJoinList]; FileByteCount@"flatlist.mx" FileByteCount@"joinlist.mx"  108640546 80000252 When the MX files are read back the FAT and thinness is preserved. << "flatlist.mx"; << "joinlist.mx"; ByteCount@myFlatList ByteCount@myJoinList  240000080 80000144 What about text output? Put[myFlatList, "flatlist.txt"]; Put[myJoinList, "joinlist.txt"]; FileByteCount@"flatlist.txt" FileByteCount@"joinlist.txt"  125000000 123333334 A slight difference in size. Mathematica randomly changes the number of values per line. The good news is the values are correct, the bad news is that text-derived lists are always FAT. join2 = << "joinlist.txt"; flat2 = << "flatlist.txt"; join2 == flat2 ByteCount@flat2 ByteCount@join2  True 240000080 240000080 Operations can make FAT lists or thin lists. The simplest way (so far) of thinning a list (by 2/3) is this: newFlat = Map[Identity, myFlatList]; ByteCount@newFlat  80000144 Mathematica 9 has the same results.
5 Replies
Sort By:
Posted 10 years ago
 Yes, Map uses autocompilation for lists over a certain size and they will be packed if possible. The direct way would be packed = DeveloperToPackedArray[unpacked]; 
Posted 10 years ago
 myList is not a packed array and I guessFlatten therefore decides the result should be unpacked. Had the thing been packed to begin with there would be no unpacking by Flatten. Try it e.g. with myList2 = RandomReal[1, {1000000, 10}]; As for Join, it is in fact seeing packed arrays since the component sublists of myList are packed. Ergo, a packed result.
Posted 10 years ago
 And to be different Map decides the result should be packed. I'll be using this. packed = Map[Identity, unpacked]; `
Posted 10 years ago
Posted 10 years ago
 Very interesting! I'm interested to know the cause of this. I sometimes also work with very big files...