# Accelerate computations using GPU?

Posted 1 year ago
1884 Views
|
9 Replies
|
3 Total Likes
|
 Hi,I am trying to accelerate computation using GPU. I started with the textbook example Needs["CUDALink"] ListLinePlot[ Thread[List[CUDAFoldList[Plus, 0, RandomReal[{-1, 1}, 500000]], CUDAFoldList[Plus, 0, RandomReal[{-1, 1}, 500000]]]]] This code should use GPU to accelerate the computation. Then for comparison, I tried to generate similar result with ListLinePlot[ Thread[List[FoldList[Plus, 0, RandomReal[{-1, 1}, 500000]], FoldList[Plus, 0, RandomReal[{-1, 1}, 500000]]]]] In both cases it took about 4 sec. to finish the computation i.e. there was no significant difference in time required to get the result. Why?
9 Replies
Sort By:
Posted 1 year ago
 Why did not you show the timing code? Are you using AbsoluteTiming? Are you also timing ( unnecessarily ) ListLinePlot?
Posted 1 year ago
 I have the timing displayed on my window. You can do that by following the link below.http://reference.wolfram.com/language/howto/DisplayTheTimingOfAnEvaluationInANotebookWindow.html
Posted 1 year ago
 Then yes indeed you timed ListLinePlot and Thread which you should not have done, as you should time only parallelized computation. Timings on window are not useful on the forum as you cannot post actual numbers. Please use AbsoluteTiming around only parallelized code and post actual times. Also click "Reply" to a specific post so responses are nested.
Posted 1 year ago
 OK, so the GPU acceleration in the above example is applied only for generating the random reals and not for showing them on the plot.
Posted 1 year ago
 Why do you think that anything else than CUDAFoldList itself would run on the GPU? It's the only function you are using with CUDA in its name.The random numbers are not generated on the GPU. Only the FoldList operation runs there. I can't test on a GPU, but with a list of that size, the operation should take a tiny fraction of a second even on a CPU (0.04 s on my machine if I replace 0 with 0. as the starting value).Accurate benchmarking is difficult, but for best results try to ensure that the calculation takes at least on the order of 0.1-1 seconds and use AbsouteTiming.
Posted 1 year ago
 I was unaware that most of the computation time in the above example was spent on generating the graph and not for FoldList operation, this caused the confusion.
Posted 1 year ago
 Thanks. Is there a way to use GPU acceleration for  RandomFunction and ParallelTable ?
 I do not think there is a way to do this, unless you write the code from scratch in C. This seems to be the main purpose of CUDALink and OpenCLLink: send your data (packed arrays) to the GPU, run code on them that you developed separately in C (not in Mathematica), copy the result back. Currently, there is no functionality to run general Mathematica code on the GPU. Even those functions that appear general, such as CUDAFoldList`, are in reality restricted to a few specific applications: it can only take Max, Min, Plus, Minus, or Times.In principle, it should be possible to have a restricted version of Table run on the GPU. Currently, Mathematica can't do this. Let's see if the new compiler framework brings improvements here.