Message Boards Message Boards

5
|
11084 Views
|
13 Replies
|
12 Total Likes
View groups...
Share
Share this post:

Sum without storing all summands before computing the sum

Dear All,

I have a code looking something like:

ClearAll[x]
x[]:=RandomInteger[10,{100,1000}]
x[]//ByteCount
MaxMemoryUsed[total=Sum[x[],{i,1000}]]

However my function x is much more complicated and computer intensive. A single evaluation of x[] is roughly 8*10^5 bytes (800 kB). For summing a 1000 of these the memory usage is 800 MB (!!). (in my real application it is 10s of GB).

I now replaced it with a Do loop and do the 'intermediate' summing myself.

Is there a method, or option such that Sum already starts summing, more like a fold-operation? Such that it doesn't store all the results before adding them all up. The current implementation of Sum looks more like:

Total[Table[  ,spec]]

Such that it has large memory overhead.

POSTED BY: Sander Huisman
13 Replies

That is for sum of two cores. It might be proved by indicating the number of the used core in ParallelEvaluate[]:

In[17]:= ClearAll[x]
x[] := RandomInteger[10, {100, 1000}]
x[] // ByteCount
ParallelEvaluate[MaxMemoryUsed[total = Sum[x[], {i, 1000}]], 1]

Out[19]= 800152

Out[20]= 800977040

For ParallelSum[]:

In[25]:= x[] // ByteCount
MaxMemoryUsed[total = ParallelSum[x[], {i, 1000}]]

Out[25]= 800152

Out[26]= 25133408

Sure, Do[] is better:

In[17]:= total = 0;
MaxMemoryUsed@Do[total += x[], 1000]

Out[18]= 2400640

or Nest[]:

In[27]:= MaxMemoryUsed@Nest[Plus[x[] + #] &, 0, 1000]
Out[27]= 2400880

or While[]:

In[50]:= total = x[]; i = 0;
MaxMemoryUsed@While[++i < 1000, total += x[]]

Out[51]= 1600384

One last suggestion (and then I will stop!):

What if you write an "externalSum" function in C. It would take your arrays and sum them up in C. I've done this a few times when Mathematica was "difficult".

(ie The last time I did this was to write a printf function in C so I could pass variables and a format string and get the exact printout that I wanted. )

Would that work for you? (While your knowledge of Mathematica is impressive, If you have not done C externals in Mathematica before I can send you an example)

Regards

POSTED BY: Neil Singer

Unfortunately I could but it would be a lot slower. Each matrix that I'm summing is (kinda) the result of a DistanceMatrix with a large number of points. To do those individually is very slow.

POSTED BY: Sander Huisman

Based on Devendra's explanation, I see the difference:

In[1]:= MaxMemoryUsed[total = Sum[{RandomInteger[10]}, {i, 100000}]]

Out[1]= 8801304

In[2]:= MaxMemoryUsed[total = Sum[RandomInteger[10], {i, 100000}]]

Out[2]= 7528

Sander, could you structure your problem as an array of sums instead of a sum of arrays?

Something like the first (avoiding the second):

In[3]:= MaxMemoryUsed[
 total = Table[Sum[RandomInteger[10], {i, 1000}], 100]]

Out[3]= 10112

In[4]:= MaxMemoryUsed[
 total = Sum[Table[RandomInteger[10], 100], {i, 1000}]]

Out[4]= 1751232

Hope this helps.

Regards

POSTED BY: Neil Singer

Thanks for your explanation! Either I will use the procedural method, or keep my Do loop, as it works fine now (speed and memory wise). Parallelisation of such a Do loop is however trickier than a Sum, as inside my do loop I overwrite variables. So I have to cut the do loop in to pieces manually... But no big deal.

Thanks again!

POSTED BY: Sander Huisman

Sum allows very general inputs (numeric, symbolic, etc.) in its first argument. If it detects that the input is numeric, then a direct procedural method (with no storage) is used. At present, it does not check if the first argument is an array and hence does not apply procedural summation without storage of intermediate results, which leads to large memory usage in examples such as the one considered by you. Optimization in this situation would mean checking for arrays, which may be somewhat expensive particularly because Sum is not specially designed for such inputs.

POSTED BY: Devendra Kapadia

Thanks for the answer Devendra. What do you mean with optimized? compiled? I don't think my real function (not the example) is compilable.

POSTED BY: Sander Huisman

Sum uses the procedural method (adding successive values to obtain the sum) by default for your example,

Unfortunately, at present, the procedural method in Sum is not optimized for adding arrays, and using a Do loop for doing the intermediate summations (as already done by you) is a nice way to work around the limitation.

Thank you for drawing our attention to this issue, and sorry for the inconvenience.

POSTED BY: Devendra Kapadia

I do not know....

POSTED BY: Mariusz Iwaniuk

Interesting!

The memory usage is indeed much smaller. However:

Monitor[Sum[s[k], {k, 100}, Method -> "Procedural"], {k, MemoryInUse[]}]
Monitor[Sum[s[k], {k, 100}], {k, MemoryInUse[]}]

both show the same behaviour, is Procedural the default for 'small' k? What does "compute the sum procedurally" actually mean? Whatever the method, it will be done with some procedure...

POSTED BY: Sander Huisman

Is that for the main core or for the sum of all cores? or per core?

POSTED BY: Sander Huisman

Consider:

s[k_] := 11111111111^(11111111 + k)

Monitor[Table[{k, s[k]}, {k, 100}], {k, MemoryInUse[]}]

Monitor[Sum[s[k], {k, 100}, Method -> "Procedural"], {k, MemoryInUse[]}]
POSTED BY: Michael Trott
  ClearAll[x]
  x[] := RandomInteger[10, {100, 1000}]
  x[] // ByteCount
  MaxMemoryUsed[total = ParallelSum[x[], {i, 1000}]]

In my machine the memory usage is 25 MB :) 2 core Cpu.

POSTED BY: Mariusz Iwaniuk
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract