Group Abstract

Message Boards

WOLFRAM COMMUNITY

18.9K Views

10 Replies

4 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Accelerate computations using GPU?

Rafael Petrosian

Posted 9 years ago

Hi, I am trying to accelerate computation using GPU. I started with the textbook example Needs["CUDALink`"] ListLinePlot[ Thread[List[CUDAFoldList[Plus, 0, RandomReal[{-1, 1}, 500000]], CUDAFoldList[Plus, 0, RandomReal[{-1, 1}, 500000]]]]] This code should use GPU to accelerate the computation. Then for comparison, I tried to generate similar result with ListLinePlot[ Thread[List[FoldList[Plus, 0, RandomReal[{-1, 1}, 500000]], FoldList[Plus, 0, RandomReal[{-1, 1}, 500000]]]]] In both cases it took about 4 sec. to finish the computation i.e. there was no significant difference in time required to get the result. Why?

POSTED BY: Rafael Petrosian

10 Replies

Sort By:

Szabolcs Horvát

Posted 9 years ago

I do not think there is a way to do this, unless you write the code from scratch in C. This seems to be the main purpose of CUDALink and OpenCLLink: send your data (packed arrays) to the GPU, run code on them that you developed separately in C (not in Mathematica), copy the result back. Currently, there is no functionality to run general Mathematica code on the GPU. Even those functions that appear general, such as `CUDAFoldList`, are in reality restricted to a few specific applications: it can only take Max, Min, Plus, Minus, or Times. In principle, it should be possible to have a restricted version of Table run on the GPU. Currently, Mathematica can't do this. Let's see if the new compiler framework brings improvements here.

POSTED BY: Szabolcs Horvát

Rafael Petrosian

Posted 9 years ago

That is sad. Thanks for the information.

POSTED BY: Rafael Petrosian

Updating Name

Posted 4 years ago

Hello Szabolcs, RE: "In principle, it should be possible to have a restricted version of Table run on the GPU. Currently, Mathematica can't do this. Let's see if the new compiler framework brings improvements here." A question: I am not a heavy compiler user but should be; did you get the impression from WTC2021 that things have developed in this respect? Or is it something we need to go find out? Thanks if you can answer.

POSTED BY: Updating Name

Rafael Petrosian

Posted 9 years ago

Thanks. Is there a way to use GPU acceleration for RandomFunction and ParallelTable ?

POSTED BY: Rafael Petrosian

Rafael Petrosian

Posted 9 years ago

I have the timing displayed on my window. You can do that by following the link below. http://reference.wolfram.com/language/howto/DisplayTheTimingOfAnEvaluationInANotebookWindow.html

POSTED BY: Rafael Petrosian

Kapio Letto

Kapio Letto, Consalting

Posted 9 years ago

Then yes indeed you timed `ListLinePlot` and `Thread` which you should not have done, as you should time only parallelized computation. Timings on window are not useful on the forum as you cannot post actual numbers. Please use `AbsoluteTiming` around only parallelized code and post actual times. Also click "Reply" to a specific post so responses are nested.

POSTED BY: Kapio Letto

Rafael Petrosian

Posted 9 years ago

OK, so the GPU acceleration in the above example is applied only for generating the random reals and not for showing them on the plot.

POSTED BY: Rafael Petrosian

Szabolcs Horvát

Posted 9 years ago

Why do you think that anything else than `CUDAFoldList` itself would run on the GPU? It's the only function you are using with CUDA in its name. The random numbers are not generated on the GPU. Only the FoldList operation runs there. I can't test on a GPU, but with a list of that size, the operation should take a tiny fraction of a second even on a CPU (0.04 s on my machine if I replace `0` with `0.` as the starting value). Accurate benchmarking is difficult, but for best results try to ensure that the calculation takes at least on the order of 0.1-1 seconds and use `AbsouteTiming`.

POSTED BY: Szabolcs Horvát

Rafael Petrosian

Posted 9 years ago

I was unaware that most of the computation time in the above example was spent on generating the graph and not for FoldList operation, this caused the confusion.

POSTED BY: Rafael Petrosian

Kapio Letto

Kapio Letto, Consalting

Posted 9 years ago

Why did not you show the timing code? Are you using `AbsoluteTiming`? Are you also timing ( unnecessarily ) `ListLinePlot`?

POSTED BY: Kapio Letto

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback