1) The function you posted doesn't seem quite right. See output to:
f[x_, y_] := {Sin[x], Cos[y + 1]}; With[{max1 = 10, max2 = 4, x1 = 0,
x2 = 2, x1step = 1, x2step = 1}, Module[{counter1 = 1, counter2 = 1},
For[counter1, counter1 < max1, counter1++,
For[counter2, counter2 < max2, counter2++,
Print@f[x1 + counter1*x1step, x2 + counter2*x2step]]]
]]
=>
{Sin[1],Cos[4]}
{Sin[1],Cos[5]}
{Sin[1],Cos[6]}
2) Not sure that For statements nest all that well. Try it for a simple case.
3) Parallel computing sounds good, but -- it takes time to transfer information to each core, and time to retrieve it and assemble it into the result. So -- unless the function you want to evaluate takes much longer than the setup and knockdown process, you actually lose time by computing in parallel. additionally, consider that the non-parallel versions of of Table and Array have been considerably optimized.
So, for this easily evaluated function:
Serial version takes 2.9 seconds:
ParallelArray[{Sin@(1.*#1), Cos@(1. #2)} &, {3*^3, 4*^3}, {1, 10}];
but the serial version takes 4.0. seconds.
ParallelArray[{Sin@(1.*#1), Cos@(1. #2)} &, {3*^3, 4*^3}, {1, 10}
I would suggest:
a) Try to formulate and execute the problem in serial first, for a small case. Do not use the nested For. Try for Array[], as shown above. Note: ParallelTable won't accept the above evaluation, but Array will.
b) Then try the Parallel version of Array and see if it helps. NOTE: Your parallel computations must be independent of each other for the above method to work. If they modify a common data structure, things get more complicated.