Message Boards Message Boards

0
|
5512 Views
|
15 Replies
|
6 Total Likes
View groups...
Share
Share this post:

Reducing the iteration time involving If?

Posted 2 years ago

I want to know why the speed its not doubled, if in this computation the procces seems parallel to me.

Given a matrix 6M rows and 2 columns I want to know how many rows have the same value, say ones for this example

mat = Array[RandomInteger[5] &, {6000000, 2}]; 

if a loop over the 6 million rows and compare if the column 1 and coloumn 2 have the number say 1 for this example, I get this result

num1=0;
AbsoluteTiming[For[i=1, i<=6000000,i++,
If[mat[[i,1]]==1 && mat[[i,2]]==1 ,num1=num1+1];
]]Print[num1]

the result is 7.46366 seconds.

Now if I loop to 3 million but in two parts one from row i to 3M, second from i+3M

num1=0; num2=0;
AbsoluteTiming[For[i=1, i<= 3000000,i++,
If[mat[[i,1]]==1 && mat[[i,2]]==1 ,num1=num1+1];
If[mat[[i+3000000,1]]==1 && mat[[i+3000000,2]]==1 ,num2=num2+1];
]]Print[num1+num2]

the result is 6.75842 seconds; I was expecting max 4 seconds. Why the time is not half if the range of the loop is half?

POSTED BY: Legibus Motus
15 Replies

I'll make a few remarks that might help to clarify why the original expectation cannot be met. In a For loop it is possible there might be statements whose execution could, in principle, be carried out in parallel. To do so would involve detailed code analysis of a sort that is not realistically going to be done in an interpreted language environment. There are, as other responses indicate, ways to achieve such parallelization, or else to in other ways get code that runs much faster. One is to recast in a way that allows for vectorized computations. Whether these are or are not done in parallel can ultimately depend on very low level details of the computation, in particular whether they happen in BLAS library code that uses parallelization. Even if no parallelization happens, such code still benefits from avoiding loop overhead and avoiding memory reads (since a vectorized function can take advantage of using a single read).

This is intended as a general overview and an attempt to respond to the "why is it not what I expected" aspect of the original post. Others have already provided nice ways to better achieve the desired goal of having (much) faster code.

Another way, also mentioned, is to rewrite in a way that allows Mathematica to do the necessary code analysis and to parallelize. The Compile function, and its newer variant, can be used to achieve this goal. Alternatively, explicit use of e.g. ParallelDo can be used. In this case one would want a coarse-grained approach, wherein large blocks of the array get sent to each processor rather than individual calls each going to the next available processor (as that would lose considerable time in shipping code and values to/from subkernels to the master kernel).

POSTED BY: Daniel Lichtblau
Posted 2 years ago

Maybe we can leverage the idea behind your gifs. Try the following experiments (deliberately designed for low performance so you can watch progress):

testImage = ConstantImage[GrayLevel[.5], {200, 200}];
Monitor[
  For[col = 1, col <= 200, col++,
   For[row = 1, row <= 200, row++,
    testImage = ReplaceImageValue[testImage, {col, row} -> .8]]],
  testImage] // AbsoluteTiming

====

testImage = ConstantImage[GrayLevel[.5], {200, 200}];
Monitor[
  For[col = 1, col <= 100, col++,
   For[row = 1, row <= 200, row++,
    testImage = ReplaceImageValue[testImage, {col, row} -> .8];
    testImage = ReplaceImageValue[testImage, {col + 100, row} -> .2]]],
  testImage] // AbsoluteTiming

====

testImage = ConstantImage[GrayLevel[.5], {200, 200}];
Monitor[
  For[col = 1, col <= 100, col++,
   For[row = 1, row <= 100, row++,
    testImage = ReplaceImageValue[testImage, {col, row} -> .9];
    testImage = ReplaceImageValue[testImage, {col + 100, row + 100} -> .6]];
   For[row = 1, row <= 100, row++,
    testImage = ReplaceImageValue[testImage, {col, row + 100} -> .1];
    testImage = ReplaceImageValue[testImage, {col + 100, row} -> .4]]],
  testImage] // AbsoluteTiming
POSTED BY: Eric Rimbey
Posted 2 years ago

why the two If[] are not evaluated at the same time

No two evaluations can happen at the same time. The processor executes machine instructions one at a time. In your analogy, the painter is the cpu. You can't magically create an extra CPU just by re-ordering the instructions.The same painter is painting the wall. The painter can paint in strips from left to right, or the painter can alternate strips on the right and left, or the painter can do any number of other patterns. But it requires the same amount of paint and the same amount of work by the painter.

Continuing this analogy, adding another painter isn't as trivial as it may sound. You can't just send another painter into the room and expect the productivity to double. The second painter needs a second brush at least. It's best to divide the paint into two containers so the painters aren't bumping into each other refilling their brush. And it's best to assign each painter a specific half of the wall so that they aren't bumping into each other or duplicating effort. And you have double the cleanup.

So, where in your code examples have you ever added a second painter? You're always using just one painter. You've altered the sequence in which the painting gets done, but you haven't added any extra painters. This seems so blindingly obvious to me that I fear I just don't understand what you're saying. What makes you think you've added an extra painter? Have you read something somewhere about optimization that motivated this example? Maybe we can examine your sources of information and show you where you've gone off the rails. I'm still suspicious that this is some sort of trolling effort.

POSTED BY: Eric Rimbey

While Michael's post is still a bit faster by avoiding the If[], this addition to the Compiled version makes it parallel and almost the same speed as Michael's clever post which avoids the If[] (I forgot to try the parallelization last night which was the whole reason for showing you the compile in the first place!)

cS = 
 Compile[{{x, _Real, 1}}, If[x[[1]] == x[[2]] == 1, 1, 0], 
  RuntimeAttributes -> {Listable}, Parallelization -> True]


In[9]:= Total[cS[mat]] // AbsoluteTiming

Out[9]= {0.121325, 165749}

This dovetails with what both Michael and I said -- you should always rely on the built-in optimization of list or vector processing capabilities of Mathematica.

As to your question about text in Compile, String is not a valid type for Compile[]. However, as we have shown above, you can get speedups (even for string functions) by relying on Mathematica's built-in optimizations.

If you want to experiment with Mathematica's Parallelization you can do something like this by changing the For to a Do so it can be parallelized (For loops can't necessarily be done out of order). Note it is always faster than For but still painfully slow compared to the vector approaches.

num1 = 0;
AbsoluteTiming[
 Do[If[mat[[i, 1]] == 1 && mat[[i, 2]] == 1, num1 = num1 + 1], {i, 1, 
   6000000}]]
num1 

(*  {3.06585, Null}  *)

num1=0;
n = 3;
CloseKernels[];LaunchKernels[n];
AbsoluteTiming[ParallelDo[If[mat[[i,1]]==1&&mat[[i,2]]==1,num1=num1+1],{i,1,6000000}]]
num1
(* {1.45851, Null} *)

Try running this with 1,2,3,4, etc Kernels. You see diminishing returns after using 3 Kernels to do the parallel processing -- overhead dominates. But this will give you a feel for what is going on.

Regards

Neil

POSTED BY: Neil Singer
Posted 2 years ago

Eric Rimbey, Neil Singer and Michael Rogers I appreciate your replies; your alternatives are really fast and I learnt from them. When I wrote the entry I was not thinking about other ways to do it but with the intention to know why the two If[] are not evaluated at the same time, because I though that the code with one If[] and the one with the two If[] would sweep the array as in the following GIF : enter image description here and that's why I spoke of parallel. The analogy is that two people could paint a wall twice as fast as one, but ten people would get in each other's way. I am learning the language, I want to understand its functions and how they work, which is the order in which things go evaluated inside a particular function.

By the way Neil, the Compiler does work with text? Let us say that the element of the second column were text ("ONE" instead of 1), how it will be? Thanks

POSTED BY: Legibus Motus
Posted 2 years ago

Okay, I'm going to assume this is a serious question, sorry for trying to be funny.

In your example with Monitor, you added even more computations to each iteration, so of course it takes longer. I'm baffled why you think otherwise, so I'll assume there's just some miscommunication going on here.

Moving on to your first example, you are executing the following expression 6 million times:

If[mat[[i, 1]] == 1 && mat[[i, 2]] == 1, num1 = num1 + 1]

Now let's split it up, but for illustration purposes let's start with something a bit differently than what you did:

num1 = 0; num2 = 0;
AbsoluteTiming[
 For[i = 1, i <= 3000000, i++,
  If[mat[[i, 1]] == 1 && mat[[i, 2]] == 1, num1 = num1 + 1]];
 For[i = 3000001, i <= 6000000, i++,
  If[mat[[i, 1]] == 1 && mat[[i, 2]] == 1, num1 = num1 + 1]];]

We have 3 million iterations in the first For and 3 million in the second For, and they clearly happen one after the other. And they evaluate the same expression in each iteration. So, we still have 6 million evaluations of that expression. We should expect no speed up.

Now let's look at how you split it up:

num1 = 0; num2 = 0;
AbsoluteTiming[
 For[i = 1, i <= 3000000, i++,
   If[mat[[i, 1]] == 1 && mat[[i, 2]] == 1, num1 = num1 + 1];
   If[mat[[i + 3000000, 1]] == 1 && mat[[i + 3000000, 2]] == 1, num2 = num2 + 1]];]

We only have 3 million iterations in total, but our core expression shows up twice in each iteration (slightly altered, but essentially the same). When i=1, we check the first pair in the matrix AND we check the 3000001st pair in the matrix. So, we're still evaluation that same expression (essentially) 6 million times. We haven't pre-computed anything. The cost of each evaluation is the same, and we do it 6 million times.

You say you expect the overall time to be half as long, but that means each evaluation would have to be half as long on average. Why would we expect that? My joke was trying to point this out. If halving the number of iterations but doubling the work done in each iteration could reduce the total time by half, then doing 1/10 the iterations with 10 times the work in each should then take only 1/10 the total time. Same logic. If you reject this for 10, why don't you reject it for 2?

If your objective is to understand the timings, then I hope this explanation helps. If your objective is just to improve performance, then you could use Do instead of For, or (as I showed above) you could just use Count.

Since you keep mentioning "parallel", I'll point out that there are ways to get Mathematica to use multiple kernels, e.g. ParallelDo and other functions with the Parallel prefix. In my experience these functions can be tricky to use correctly. Maybe others with more understanding of performance can explain whether they'd be useful here.

POSTED BY: Eric Rimbey

Mathematica automatically uses "vectorized" operations for some basic math functions, which take advantage of pipelining and parallelization, via libraries it is built with (such as the Math Kernel Library). The For[] loop takes ~4.5 sec. on my computer, about 50 times as long as the following:

SeedRandom[0]; (* for reproducibility *)
mat = RandomInteger[5, {6000000, 2}];

6000000 - Total[Unitize@Total[Abs[mat - 1], {2}]] // AbsoluteTiming
(*  {0.092755, 165749}  *)

num1 = 0;
AbsoluteTiming[
 For[i = 1, i <= 6000000, i++, 
  If[mat[[i, 1]] == 1 && mat[[i, 2]] == 1, num1 = num1 + 1];]]
num1
(*
{4.53598, Null}
165749
*)

You can see that the For[] is not very efficient. If you replace the If[] statement by the increment statement num1 = num1 + 1, the timing drops to ~3 sec. If you replace it by a no-op Null, the timing drops to ~1.5 sec., which is the overhead for a For[] loop that does nothing but increment i six million times. Aside from the fact that the lines a For[] loop are executed sequentially and not in parallel as has been pointed out, what one can expect to save is half the 1.5 sec. in incrementing i half as much.

POSTED BY: Michael Rogers

and this is even faster:

In[8]:= AbsoluteTiming[Total[Map[cS, mat]]]

Out[8]= {0.266503, 166569}

I guess making the function Listable actually slows it down. I would remove the RuntimeAttributes from the definition of the Compiled function.

POSTED BY: Neil Singer

Legibus,

While Mathematica offers For[] and other looping constructs, It is rarely a good idea to use them. You can almost always compute faster using Mathematica's list handling instead. In your case, using For[] on my machine:

In[3]:= num1 = 0;
AbsoluteTiming[
 For[i = 1, i <= 6000000, i++, 
  If[mat[[i, 1]] == 1 && mat[[i, 2]] == 1, num1 = num1 + 1];]]
Print[num1]

Out[4]= {4.65165, Null}

During evaluation of In[3]:= 166569

However you get more than an order of magnitude speedup with this code (while getting the same answer):

In[43]:= AbsoluteTiming[
 Total[Map[If[#[[1]] == #[[2]] == 1, 1, 0] &, mat]]]

Out[43]= {0.367711, 166569}

You can also rely on Compilation to get more speed:

cS = Compile[{{x, _Real, 1}}, If[x[[1]] == x[[2]] == 1, 1, 0], 
  RuntimeAttributes -> {Listable}]
In[7]:= Total[cS[mat]] // AbsoluteTiming

Out[7]= {0.327464, 166569}

You can further parallelize the list operations but often you will actually slow things down because the overhead of parallelization swamps the super-fast list-based operations. Parallel speedups are highly dependent on the actual code.

Regards,

Neil

POSTED BY: Neil Singer
Posted 2 years ago

The example with the Monitor function shows that it takes more than two times, if I do what you suggest it will be more than 10 times slower not faster as you say.

The question is why the iterations does not happens at the same time and if possible how could be done. How to make the If evaluations hapen in parallel.

POSTED BY: Legibus Motus
Posted 2 years ago

So then split it into 10 computations per iteration and only do 1/10th the number of iterations. You should get a factor of ten speed up.

POSTED BY: Eric Rimbey
Posted 2 years ago

The i^2 is a simpler computation that replace the updating num1 and num2, (even doing nothing give differences of just 1 second faster). Yes I was expecting nearly double speed because althoug there are two If statements to me they happens at the same time (iteration). Let us Monitor both:

num0=0; num1=0; num2=0;
AbsoluteTiming[
Monitor[
For[i=1, i<=6000000, i++,
If[mat[[i,1]]==1 && mat[[i,2]]==1 ,num0=num0+1];

If[i<=3000000,
If[mat[[i,1]]==1 && mat[[i,2]]==1 ,num1=num1+1];
If[mat[[i+3000000,1]]==1 && mat[[i+3000000,2]]==1 ,num2=num2+1];
]
],{i,Style[num0,Red],Style[num1+num2,Blue]}]]

when i=3000000 the second part is already finish gives {i=3000000, 83475, 166341}, for a final result {i=6000000, 166341, 166341}, in 15.01 seconds.

I want to know why it seems to be happening at he same iteration but according with las result 15.01 seconds they are really not, beacuse the time to do both takes more than the sum of the times it would take to do it individually.

POSTED BY: Legibus Motus
Posted 2 years ago

I don't understand the point of the i^2 example.

As I read your original post, you're expecting a nearly double speed up by halving the number of iterations. But you're also doubling the number of computations in each iteration, so the total number of computations is about the same. I don't know why you'd expect any speedup.

POSTED BY: Eric Rimbey
Posted 2 years ago

Consider this modification instead of updating num1 and num2 we compute the square of the i row

AbsoluteTiming[
For[i=1, i<=6000000, i++,
If[mat[[i,1]]==1 && mat[[i,2]]==1 ,i^2];
]]

result 7.461 seconds

AbsoluteTiming[
For[i=1, i<= 3000000,i++,
If[mat[[i,1]]==1 && mat[[i,2]]==1 ,i^2];
If[mat[[i+3000000,1]]==1 && mat[[i+3000000,2]]==1 ,i^2];
]]

result 6.647 seconds

POSTED BY: Legibus Motus
Posted 2 years ago

You're still doing the same number of comparisons and updates to num1, num2. For each of the 3000000 updates to the loop index you avoid you add in additions in the Part expressions.

Side note,

AbsoluteTiming[Count[mat, {1, 1}]]
(* {1.53748, 167053} *)
POSTED BY: Eric Rimbey
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract