Message Boards Message Boards


Accelerate computations using GPU?

Posted 5 years ago
10 Replies
4 Total Likes


I am trying to accelerate computation using GPU. I started with the textbook example


 Thread[List[CUDAFoldList[Plus, 0, RandomReal[{-1, 1}, 500000]], 
   CUDAFoldList[Plus, 0, RandomReal[{-1, 1}, 500000]]]]]

This code should use GPU to accelerate the computation. Then for comparison, I tried to generate similar result with

 Thread[List[FoldList[Plus, 0, RandomReal[{-1, 1}, 500000]], 
   FoldList[Plus, 0, RandomReal[{-1, 1}, 500000]]]]]

In both cases it took about 4 sec. to finish the computation i.e. there was no significant difference in time required to get the result. Why?

POSTED BY: Rafael Petrosian
10 Replies

I do not think there is a way to do this, unless you write the code from scratch in C. This seems to be the main purpose of CUDALink and OpenCLLink: send your data (packed arrays) to the GPU, run code on them that you developed separately in C (not in Mathematica), copy the result back.

Currently, there is no functionality to run general Mathematica code on the GPU. Even those functions that appear general, such as CUDAFoldList, are in reality restricted to a few specific applications: it can only take Max, Min, Plus, Minus, or Times.

In principle, it should be possible to have a restricted version of Table run on the GPU. Currently, Mathematica can't do this. Let's see if the new compiler framework brings improvements here.

POSTED BY: Szabolcs Horvát

That is sad. Thanks for the information.

POSTED BY: Rafael Petrosian
Posted 1 year ago

Hello Szabolcs,

RE: "In principle, it should be possible to have a restricted version of Table run on the GPU. Currently, Mathematica can't do this. Let's see if the new compiler framework brings improvements here."

A question: I am not a heavy compiler user but should be; did you get the impression from WTC2021 that things have developed in this respect? Or is it something we need to go find out? Thanks if you can answer.

POSTED BY: Updating Name

Thanks. Is there a way to use GPU acceleration for

 RandomFunction and ParallelTable 


POSTED BY: Rafael Petrosian

I have the timing displayed on my window. You can do that by following the link below.

POSTED BY: Rafael Petrosian

Then yes indeed you timed ListLinePlot and Thread which you should not have done, as you should time only parallelized computation. Timings on window are not useful on the forum as you cannot post actual numbers. Please use AbsoluteTiming around only parallelized code and post actual times. Also click "Reply" to a specific post so responses are nested.

POSTED BY: Kapio Letto

OK, so the GPU acceleration in the above example is applied only for generating the random reals and not for showing them on the plot.

POSTED BY: Rafael Petrosian

Why do you think that anything else than CUDAFoldList itself would run on the GPU? It's the only function you are using with CUDA in its name.

The random numbers are not generated on the GPU. Only the FoldList operation runs there. I can't test on a GPU, but with a list of that size, the operation should take a tiny fraction of a second even on a CPU (0.04 s on my machine if I replace 0 with 0. as the starting value).

Accurate benchmarking is difficult, but for best results try to ensure that the calculation takes at least on the order of 0.1-1 seconds and use AbsouteTiming.

POSTED BY: Szabolcs Horvát

I was unaware that most of the computation time in the above example was spent on generating the graph and not for FoldList operation, this caused the confusion.

POSTED BY: Rafael Petrosian

Why did not you show the timing code? Are you using AbsoluteTiming? Are you also timing ( unnecessarily ) ListLinePlot?

POSTED BY: Kapio Letto
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract