# Sum without storing all summands before computing the sum

Posted 5 years ago
6649 Views
|
13 Replies
|
12 Total Likes
|
 Dear All,I have a code looking something like: ClearAll[x] x[]:=RandomInteger[10,{100,1000}] x[]//ByteCount MaxMemoryUsed[total=Sum[x[],{i,1000}]] However my function x is much more complicated and computer intensive. A single evaluation of x[] is roughly 8*10^5 bytes (800 kB). For summing a 1000 of these the memory usage is 800 MB (!!). (in my real application it is 10s of GB). I now replaced it with a Do loop and do the 'intermediate' summing myself. Is there a method, or option such that Sum already starts summing, more like a fold-operation? Such that it doesn't store all the results before adding them all up. The current implementation of Sum looks more like: Total[Table[ ,spec]] Such that it has large memory overhead.
13 Replies
Sort By:
Posted 5 years ago
  ClearAll[x] x[] := RandomInteger[10, {100, 1000}] x[] // ByteCount MaxMemoryUsed[total = ParallelSum[x[], {i, 1000}]] In my machine the memory usage is 25 MB :) 2 core Cpu.
Posted 5 years ago
 Is that for the main core or for the sum of all cores? or per core?
Posted 5 years ago
 I do not know....
Posted 5 years ago
 That is for sum of two cores. It might be proved by indicating the number of the used core in ParallelEvaluate[]: In[17]:= ClearAll[x] x[] := RandomInteger[10, {100, 1000}] x[] // ByteCount ParallelEvaluate[MaxMemoryUsed[total = Sum[x[], {i, 1000}]], 1] Out[19]= 800152 Out[20]= 800977040 For ParallelSum[]: In[25]:= x[] // ByteCount MaxMemoryUsed[total = ParallelSum[x[], {i, 1000}]] Out[25]= 800152 Out[26]= 25133408 Sure, Do[] is better: In[17]:= total = 0; MaxMemoryUsed@Do[total += x[], 1000] Out[18]= 2400640 or Nest[]: In[27]:= MaxMemoryUsed@Nest[Plus[x[] + #] &, 0, 1000] Out[27]= 2400880 or While[]: In[50]:= total = x[]; i = 0; MaxMemoryUsed@While[++i < 1000, total += x[]] Out[51]= 1600384 
Posted 5 years ago
 Consider: s[k_] := 11111111111^(11111111 + k) Monitor[Table[{k, s[k]}, {k, 100}], {k, MemoryInUse[]}] Monitor[Sum[s[k], {k, 100}, Method -> "Procedural"], {k, MemoryInUse[]}] 
Posted 5 years ago
 Interesting!The memory usage is indeed much smaller. However: Monitor[Sum[s[k], {k, 100}, Method -> "Procedural"], {k, MemoryInUse[]}] Monitor[Sum[s[k], {k, 100}], {k, MemoryInUse[]}] both show the same behaviour, is Procedural the default for 'small' k? What does "compute the sum procedurally" actually mean? Whatever the method, it will be done with some procedure...
Posted 5 years ago
 Sum uses the procedural method (adding successive values to obtain the sum) by default for your example, Unfortunately, at present, the procedural method in Sum is not optimized for adding arrays, and using a Do loop for doing the intermediate summations (as already done by you) is a nice way to work around the limitation.Thank you for drawing our attention to this issue, and sorry for the inconvenience.
Posted 5 years ago
 Thanks for the answer Devendra. What do you mean with optimized? compiled? I don't think my real function (not the example) is compilable.
Posted 5 years ago
 Sum allows very general inputs (numeric, symbolic, etc.) in its first argument. If it detects that the input is numeric, then a direct procedural method (with no storage) is used. At present, it does not check if the first argument is an array and hence does not apply procedural summation without storage of intermediate results, which leads to large memory usage in examples such as the one considered by you. Optimization in this situation would mean checking for arrays, which may be somewhat expensive particularly because Sum is not specially designed for such inputs.
Posted 5 years ago
 Thanks for your explanation! Either I will use the procedural method, or keep my Do loop, as it works fine now (speed and memory wise). Parallelisation of such a Do loop is however trickier than a Sum, as inside my do loop I overwrite variables. So I have to cut the do loop in to pieces manually... But no big deal.Thanks again!
Posted 5 years ago
 Based on Devendra's explanation, I see the difference: In[1]:= MaxMemoryUsed[total = Sum[{RandomInteger[10]}, {i, 100000}]] Out[1]= 8801304 In[2]:= MaxMemoryUsed[total = Sum[RandomInteger[10], {i, 100000}]] Out[2]= 7528 Sander, could you structure your problem as an array of sums instead of a sum of arrays?Something like the first (avoiding the second): In[3]:= MaxMemoryUsed[ total = Table[Sum[RandomInteger[10], {i, 1000}], 100]] Out[3]= 10112 In[4]:= MaxMemoryUsed[ total = Sum[Table[RandomInteger[10], 100], {i, 1000}]] Out[4]= 1751232 Hope this helps.Regards