Group Abstract Group Abstract

Message Boards Message Boards

5
|
10.4K Views
|
10 Replies
|
23 Total Likes
View groups...
Share
Share this post:

Mathematica v12.3: CUDA GPU still not working

POSTED BY: Jonathan Kinlay
10 Replies

About numerical vs image data: There is absolutely no difference between them from the neural net perspective. Images are immediately turned into numerical data by NetEncoder["Image"]

Thats what I figured, which is why I couldnt understand the apparent discrepancy.

However I tested a version of your example using numerical data and the speed-up is indeed huge. So all seems to be well.

Thank you!

POSTED BY: Jonathan Kinlay
POSTED BY: Jonathan Kinlay

About problems with other GPUs: By briefly going to the previous thread, all I can see about neural net functionality besides the 3090 problem is MacOS support (NVIDIA/Apple's fault, not ours) and a complaint about a faulty 12.2 update which we fixed a few days later with another update. I'm not going to comment on CUDALink because I'm not involved with it. I consider the GPU support on the ML side pretty solid: we've been successfully using NetTrain for our own internal projects on a variety of GPU models and machines (including AWS instances) for years. If you or any other user still have problems please contact tech support.

About numerical vs image data: There is absolutely no difference between them from the neural net perspective. Images are immediately turned into numerical data by NetEncoder["Image"] and fed to the network as such. I have ran your own example on CPU vs GPU on my laptop (Dell XPS 15, GTX 1650M) and GPU is actually showing an improvement:

t = AbsoluteTime[];
NetTrain[net, TrainingData, BatchSize -> 10000, TargetDevice -> "CPU"];
Print[AbsoluteTime[] - t];

24.876159

t = AbsoluteTime[];
NetTrain[net, TrainingData, BatchSize -> 10000, TargetDevice -> "GPU"];
Print[AbsoluteTime[] - t];

15.667683

With a larger net, the improvement is massive (don't set a large BatchSize here or memory will blow up)

TrainingData = 
  RandomReal[1, {10000, 4}] -> RandomReal[1, {10000, 4}];
net = NetChain[{500, Ramp, 500, Ramp, 500, Ramp, 4}];

t = AbsoluteTime[];
NetTrain[net, TrainingData, MaxTrainingRounds -> 5, 
  TargetDevice -> "CPU"];
Print[AbsoluteTime[] - t];

7.083551

t = AbsoluteTime[];
NetTrain[net, TrainingData, MaxTrainingRounds -> 5, 
  TargetDevice -> "GPU"];
Print[AbsoluteTime[] - t];

0.654267

Do you get similar results for CPU vs GPU timings (especially with the second example)?

First of all, some remarks and comments:

  • GPU support in our system is not a single piece of functionality. CUDALink (CUDAInformation/InstallCUDA/CUDAFoldList, basically everything having CUDA in its name) is completely separated from the machine learning side (NetTrain/Classify/Predict/neural net evaluation, basically everythng using TargetDevice). One does not need to run InstallCuda nor to do a manual installation to do machine learning on a GPU, only a driver installation is needed. CUDALink and ML functionality are expected to coexist smoothly in the same session, but I would avoid loading CUDALink at all when testing for GPU problems on the ML side.
  • The issue you mentioned (https://community.wolfram.com/groups/-/m/t/2141352) is about using an RTX 3090 on WL 12.2. This GPU came out too late in the release cycle of WL 12.2 and it was unsupported when the issue was opened (January). In March we have pushed an update to make WL 12.2 support these new cards
  • You can find a list of supported GPU for machine learning in the details section of the documentation page of TargetDevice: https://reference.wolfram.com/language/ref/TargetDevice.html
  • We are indeed lagging behind on integrated tools for reinforcement learning, but it's still possible to do some RL by building your own infrastructure. See this tutorial: https://www.wolfram.com/language/12/neural-network-framework/train-an-agent-in-a-reinforcement-learning-environment.html
  • AnomalyDetection does not support GPU. TargetDevice is not mentioned in its documentation page. It does show up in Options[AnomalyDetection] though, and it shouldn't. Same applies for LearnDistribution. The only "classic" ML function supporting TargetDevice are Classify/Predict, and only when Method -> "NeuralNetwork" is selected

Now for the remaining issue you have reported, NetTrain not using the GPU despite TargetDevice -> "GPU" was selected. What makes you think this? There is nothing hinting at this in your report. Timing calculation should be compared with a CPU training call, and even in this case I don't expect it to be revealing because the net you have chosen is so tiny that the CPU could easily outperform the GPU. This test should be carried out with a larger net, e.g. NetTrain[NetModel["LeNet Trained on MNIST Data", "UninitializedEvaluationNet"], ResourceData["MNIST"], TargetDevice -> "GPU"|"CPU"]. Also checking the GPU activity with an OS or NVIDIA tool during training would be useful.

It's an option at the top of the page, when you create a new post. But I don't see where it shows up after posting.

This is running on Windows 10.

I no longer see a definitive list of supported GPU cards anywhere on the Wolfram web site.

This GPU should be supported because (1) it's supported by NVIDIA and (2) It is supported by MMA, assuming we can take the results on CUDAInformation[] and InstallCUDA[] at face value (also, I have noted that GPU functionality works seamlessly with this card in Matlab).

If that is not the case, i.e. if this particular GPU is not supported by Mathematica for some reason, then that is an additional issue: either of both of of the CUDAInformation or InstallCUDA functions should report an issue if the specific card is not supported. Either that, or we need a new function CUDACompatibleQ[] to check for compatibility issues.

But again, from previous discussions, this is by no means the only GPU card experiencing difficulties with V12. See the previous post for details.

The purpose of posting here rather than simply going to customer support is that it gives other MMA users the opportunity to test their own configurations and publish the result. Hopefully that way users will get more traction with WR to focus resources on the problem and deal with it.

POSTED BY: Jonathan Kinlay

Jonathan, sorry if this is off topic, but in the dashboard, you can check if the post is an idea or post from the left icons

enter image description here

You can also choose to view only one of the two options

enter image description here

POSTED BY: Ahmed Elbanna

Not off-topic at all - thanks for the heads-up!

POSTED BY: Jonathan Kinlay

What is your question?

POSTED BY: Sander Huisman

It was posted under "Share an idea", not "Ask a question".

Still, I suppose the obvious question would be: "When can we expect WR to remedy the ongoing issues with GPU functionality, that have been extant since v12.0?"

And also: "When can we expect some Reinforcement learning capability to be forthcoming?"

POSTED BY: Jonathan Kinlay

Aah sorry, I do not see where the "share an idea" shows up after posting. Not sure on either to be honest. Are you running Windows or Linux? And have you contacted them directly to ask whether or not your specific GPU is supported?

POSTED BY: Sander Huisman
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard