I do not think there is a way to do this, unless you write the code from scratch in C. This seems to be the main purpose of CUDALink and OpenCLLink: send your data (packed arrays) to the GPU, run code on them that you developed separately in C (not in Mathematica), copy the result back.
Currently, there is no functionality to run general Mathematica code on the GPU. Even those functions that appear general, such as CUDAFoldList
, are in reality restricted to a few specific applications: it can only take Max, Min, Plus, Minus, or Times.
In principle, it should be possible to have a restricted version of Table run on the GPU. Currently, Mathematica can't do this. Let's see if the new compiler framework brings improvements here.