Hi
I use multiple cores all the time. The speedup is impressive.
Combining Mathematica and CUDA works, but most of our GPU programming was directly with CUDA and C++. The speedup is enormous, however, one needs to specialize in CUDA and be familiar with the specific problem and how to represent the solution efficiently for the GPU. Mathematica, in this case, will serve as a "glue" language (if the problem is GPU intensive).
I'm not aware of TPU's support, if you mean Google's TPU, but I suspect one of Wolfram employees can answer this much better than me.
HTH
yehdua