Hi Dean,
The faster code workshop indicated that using built-in functions and using real numbers (with decimal points) produces faster running programs. This is not considered parallel processing generally. Check your Parallel Kernel Preferences -> Local Kernels to see that the number of local kernels matches the expected number of cores for your processor. Verify that they are working by looking at the Parallel Kernel Status (bottom right button under the Parallel Kernel Configuration box).
If there are other computers running Mathematica, you can try and set up a lightweight grid to share the load (Parallel Kernel Preferences -> Lightweight Grid) See http://bit.ly/1KlvxmM for more info. Next you could uses remote Kernels on the Wolfram servers if you have the license for it (Parallel Kernel Preferences -> Remote Kernels). The approach throws more kernels at the problem without having to optimize code as much.
If you want to use CUDA, then I don't think "automatically" is the right word. Mathematica programs have to include CUDA functions to take advantage of the parallel processing in your GPU. Your NVIDIA graphics card drivers need to support support CUDA. The following Mathematica functions can be used to check that CUDA is available and ready to use.
Needs["CUDALink`"]
CUDAQ[]
CUDAInformation[];
TabView[Table[ii -> Grid[Replace[#, Rule[x_, y_] -> {x, y}] & /@ CUDAInformation[ii], Frame -> All], {ii, 1, $CUDADeviceCount}]]
CUDADriverVersion[]
CUDAResourcesInformation[]
SystemInformation[]
Then use to CUDA tutorial to learn to rewrite your code to use the graphics card GPU ( http:bit.ly/1Oka6GO ).