Okay, I'm going to assume this is a serious question, sorry for trying to be funny.
In your example with Monitor, you added even more computations to each iteration, so of course it takes longer. I'm baffled why you think otherwise, so I'll assume there's just some miscommunication going on here.
Moving on to your first example, you are executing the following expression 6 million times:
If[mat[[i, 1]] == 1 && mat[[i, 2]] == 1, num1 = num1 + 1]
Now let's split it up, but for illustration purposes let's start with something a bit differently than what you did:
num1 = 0; num2 = 0;
AbsoluteTiming[
For[i = 1, i <= 3000000, i++,
If[mat[[i, 1]] == 1 && mat[[i, 2]] == 1, num1 = num1 + 1]];
For[i = 3000001, i <= 6000000, i++,
If[mat[[i, 1]] == 1 && mat[[i, 2]] == 1, num1 = num1 + 1]];]
We have 3 million iterations in the first For and 3 million in the second For, and they clearly happen one after the other. And they evaluate the same expression in each iteration. So, we still have 6 million evaluations of that expression. We should expect no speed up.
Now let's look at how you split it up:
num1 = 0; num2 = 0;
AbsoluteTiming[
For[i = 1, i <= 3000000, i++,
If[mat[[i, 1]] == 1 && mat[[i, 2]] == 1, num1 = num1 + 1];
If[mat[[i + 3000000, 1]] == 1 && mat[[i + 3000000, 2]] == 1, num2 = num2 + 1]];]
We only have 3 million iterations in total, but our core expression shows up twice in each iteration (slightly altered, but essentially the same). When i=1, we check the first pair in the matrix AND we check the 3000001st pair in the matrix. So, we're still evaluation that same expression (essentially) 6 million times. We haven't pre-computed anything. The cost of each evaluation is the same, and we do it 6 million times.
You say you expect the overall time to be half as long, but that means each evaluation would have to be half as long on average. Why would we expect that? My joke was trying to point this out. If halving the number of iterations but doubling the work done in each iteration could reduce the total time by half, then doing 1/10 the iterations with 10 times the work in each should then take only 1/10 the total time. Same logic. If you reject this for 10, why don't you reject it for 2?
If your objective is to understand the timings, then I hope this explanation helps. If your objective is just to improve performance, then you could use Do instead of For, or (as I showed above) you could just use Count.
Since you keep mentioning "parallel", I'll point out that there are ways to get Mathematica to use multiple kernels, e.g. ParallelDo and other functions with the Parallel prefix. In my experience these functions can be tricky to use correctly. Maybe others with more understanding of performance can explain whether they'd be useful here.