Message Boards Message Boards

GROUPS:

Accelerate computations using GPU?

Posted 10 months ago
1522 Views
|
9 Replies
|
3 Total Likes
|

Hi,

I am trying to accelerate computation using GPU. I started with the textbook example

Needs["CUDALink`"]

ListLinePlot[
 Thread[List[CUDAFoldList[Plus, 0, RandomReal[{-1, 1}, 500000]], 
   CUDAFoldList[Plus, 0, RandomReal[{-1, 1}, 500000]]]]]

This code should use GPU to accelerate the computation. Then for comparison, I tried to generate similar result with

ListLinePlot[
 Thread[List[FoldList[Plus, 0, RandomReal[{-1, 1}, 500000]], 
   FoldList[Plus, 0, RandomReal[{-1, 1}, 500000]]]]]

In both cases it took about 4 sec. to finish the computation i.e. there was no significant difference in time required to get the result. Why?

9 Replies
Posted 10 months ago

Why did not you show the timing code? Are you using AbsoluteTiming? Are you also timing ( unnecessarily ) ListLinePlot?

I have the timing displayed on my window. You can do that by following the link below.

http://reference.wolfram.com/language/howto/DisplayTheTimingOfAnEvaluationInANotebookWindow.html

Posted 10 months ago

Then yes indeed you timed ListLinePlot and Thread which you should not have done, as you should time only parallelized computation. Timings on window are not useful on the forum as you cannot post actual numbers. Please use AbsoluteTiming around only parallelized code and post actual times. Also click "Reply" to a specific post so responses are nested.

OK, so the GPU acceleration in the above example is applied only for generating the random reals and not for showing them on the plot.

Why do you think that anything else than CUDAFoldList itself would run on the GPU? It's the only function you are using with CUDA in its name.

The random numbers are not generated on the GPU. Only the FoldList operation runs there. I can't test on a GPU, but with a list of that size, the operation should take a tiny fraction of a second even on a CPU (0.04 s on my machine if I replace 0 with 0. as the starting value).

Accurate benchmarking is difficult, but for best results try to ensure that the calculation takes at least on the order of 0.1-1 seconds and use AbsouteTiming.

I was unaware that most of the computation time in the above example was spent on generating the graph and not for FoldList operation, this caused the confusion.

Thanks. Is there a way to use GPU acceleration for

 RandomFunction and ParallelTable 

?

I do not think there is a way to do this, unless you write the code from scratch in C. This seems to be the main purpose of CUDALink and OpenCLLink: send your data (packed arrays) to the GPU, run code on them that you developed separately in C (not in Mathematica), copy the result back.

Currently, there is no functionality to run general Mathematica code on the GPU. Even those functions that appear general, such as CUDAFoldList, are in reality restricted to a few specific applications: it can only take Max, Min, Plus, Minus, or Times.

In principle, it should be possible to have a restricted version of Table run on the GPU. Currently, Mathematica can't do this. Let's see if the new compiler framework brings improvements here.

That is sad. Thanks for the information.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract