Message Boards Message Boards

Accelerate computations using GPU?

GROUPS:

Hi,

I am trying to accelerate computation using GPU. I started with the textbook example

Needs["CUDALink`"]

ListLinePlot[
 Thread[List[CUDAFoldList[Plus, 0, RandomReal[{-1, 1}, 500000]], 
   CUDAFoldList[Plus, 0, RandomReal[{-1, 1}, 500000]]]]]

This code should use GPU to accelerate the computation. Then for comparison, I tried to generate similar result with

ListLinePlot[
 Thread[List[FoldList[Plus, 0, RandomReal[{-1, 1}, 500000]], 
   FoldList[Plus, 0, RandomReal[{-1, 1}, 500000]]]]]

In both cases it took about 4 sec. to finish the computation i.e. there was no significant difference in time required to get the result. Why?

POSTED BY: Rafael Petrosian
Answer
10 days ago

Why did not you show the timing code? Are you using AbsoluteTiming? Are you also timing ( unnecessarily ) ListLinePlot?

POSTED BY: Kapio Letto
Answer
10 days ago

I have the timing displayed on my window. You can do that by following the link below.

http://reference.wolfram.com/language/howto/DisplayTheTimingOfAnEvaluationInANotebookWindow.html

POSTED BY: Rafael Petrosian
Answer
10 days ago

Then yes indeed you timed ListLinePlot and Thread which you should not have done, as you should time only parallelized computation. Timings on window are not useful on the forum as you cannot post actual numbers. Please use AbsoluteTiming around only parallelized code and post actual times. Also click "Reply" to a specific post so responses are nested.

POSTED BY: Kapio Letto
Answer
10 days ago

OK, so the GPU acceleration in the above example is applied only for generating the random reals and not for showing them on the plot.

POSTED BY: Rafael Petrosian
Answer
10 days ago

Why do you think that anything else than CUDAFoldList itself would run on the GPU? It's the only function you are using with CUDA in its name.

The random numbers are not generated on the GPU. Only the FoldList operation runs there. I can't test on a GPU, but with a list of that size, the operation should take a tiny fraction of a second even on a CPU (0.04 s on my machine if I replace 0 with 0. as the starting value).

Accurate benchmarking is difficult, but for best results try to ensure that the calculation takes at least on the order of 0.1-1 seconds and use AbsouteTiming.

POSTED BY: Szabolcs Horvát
Answer
10 days ago

I was unaware that most of the computation time in the above example was spent on generating the graph and not for FoldList operation, this caused the confusion.

POSTED BY: Rafael Petrosian
Answer
9 days ago

Thanks. Is there a way to use GPU acceleration for

 RandomFunction and ParallelTable 

?

POSTED BY: Rafael Petrosian
Answer
10 days ago

I do not think there is a way to do this, unless you write the code from scratch in C. This seems to be the main purpose of CUDALink and OpenCLLink: send your data (packed arrays) to the GPU, run code on them that you developed separately in C (not in Mathematica), copy the result back.

Currently, there is no functionality to run general Mathematica code on the GPU. Even those functions that appear general, such as CUDAFoldList, are in reality restricted to a few specific applications: it can only take Max, Min, Plus, Minus, or Times.

In principle, it should be possible to have a restricted version of Table run on the GPU. Currently, Mathematica can't do this. Let's see if the new compiler framework brings improvements here.

POSTED BY: Szabolcs Horvát
Answer
9 days ago

That is sad. Thanks for the information.

POSTED BY: Rafael Petrosian
Answer
6 days ago

Group Abstract Group Abstract