Message Boards Message Boards

Improve neural network performance with Mathematica 11.3 ?

Posted 7 years ago

I look at the blog post with the 11.3 word cloud with 'Blockchain' as the BIG center and ask how important is that? As I run the exact same data science GPU code on identical hardware software configuration except for the change from Mathematica 11.2 to Mathematica 11.3 and see my neural network performance go from 295 seconds on 11.2 to 2038 seconds on 11.3. Again NO change other than Mathematica version. And then I see that Mathematica 11.3 still does not support current XCode LLVM/GCC compiler or NVIDIA for CUDA tools (watch it move back to old paclet for Mathematica 10.5 after you upgrade your XCode command line tools to current version, am I expected to pay money to figure this out?) .

This is my experience as I explore the value of Mathematica since release 10 to today for data science and at the same time see the massive improvements and quality of Python, Jupyter, NVIDIA, iOS CoreML, Vulkan, Tensorflow and core GPU computing on MacOS, iOS and Linux.

Really questioning the value proposition of Wolfram for data science going forward. Sad.

POSTED BY: David Proffer
30 Replies

What about support for NVidia cards for option TargetDevice->"GPU" in such things as Classify and other neural net-related functions?

Given the long-time status of Mathematica as essentially platform-independent, it seems really constricting that only CUDA with NVidia is supported whereas OpenCL with AMD is not.

POSTED BY: Murray Eisenberg

What about support for NVidia cards for option TargetDevice->"GPU" in such things as Classify and other neural net-related functions?

That is already supported out-of-the-box in 11.3.0.

Given the long-time status of Mathematica as essentially platform-independent, it seems really constricting that only CUDA with NVidia is supported whereas OpenCL with AMD is not.

That has more to do with CUDA being the de-facto standard for most of the state-of-the-art neural network libraries (like MXNet), which Mathematica uses under the hood. If the libraries we use add support for OpenCL then we'd certainly investigate adding support from our end.

I meant: "What about support for AMD cards". Sorry for the confusion.

POSTED BY: Murray Eisenberg

I meant: "What about support for AMD cards". Sorry for the confusion.

In regards to the neural network framework in Mathematica 11.3, the reason why OpenCL isn't supported is that it hasn't been implemented in ML library/framework, MXNet. Please do reference the following Github issue:

MXNet Support for other Device Types, OpenCL AMD GPU

POSTED BY: Conrad Taylor
Posted 7 years ago

I reported to Wolfram Support:

"While version 11.2.0 was using CUDA paclet CUDAResources-Win64-11.2.22 (based on CUDA 8.0), version 11.3 is downloading paclet CUDAResources-Win64-10.5.0 (based on CUDA 7). I had hoped to get a paclet based on CUDA 9.1. But I get something very old."

Their reply:

"It does appear that the latest officially supported CUDA paclet is the version 10.5.0, shipped by the product by default. In the past, we have provided the customers, such as yourself, with the beta versions of newer paclets. I have reported the issues with the older paclets, including lack of support for CUDA toolkit 9.0, to our developers. They are working on releasing newer paclet, and we will update you as soon as it happens."

Please note that CUDA 7 does not support the latest GPU's. So for sure 11.3 will perform much slower than 11.2. Why they released beta paclets to 11.2 and no longer to 11.3 is a mystery.

POSTED BY: Bert Aerts

We had hoped to have updated CUDALink paclets ready for 11.3 release but we ran into some technical issues that had to be resolved first. We plan to release updated CUDALink paclets with CUDA 9.1 support in the next few days, assuming the lingering issues can be ironed out in time.

One thing that should also be mentioned: Neural Nets in the WL don't use CUDALink` at all, so this issue has no effect on the neural net framework which uses CUDA 8 on OSX and CUDA 9.1 for Windows + Linux.

Do you support MacBook Pros with NVIDIA cards (from 2014 and before)?

For quite some time now (i.e. even in 11.2), TargetDevice -> "GPU" works for me only if I load CUDALink first, and then successfully evaluate CUDAQ[]. By "successfully" I mean that I'm lucky and it doesn't cause a crash. Often, the kernel just crashes after CUDAQ[].

In earlier versions (11.0? 11.1? I don't remember) both TargetDevice -> "GPU" and CUDALink used to work fine on this machine.

I have macOS 10.13, the NVIDIA web driver, and CUDA driver version 387.128.

We fixed this annoying issue in 11.3!

I must have mixed up my versions ... I was certain that I just tried this in 11.3 ("ReleaseID" -> "11.3.0.0 (5944637, 2018030701)") and it did not work. Now I tried it again and TargetDevice -> "GPU" seems to be working fine.

Posted 7 years ago

Sorry to say Sebastian, but you are really out of touch with what got delivered to your customers with 11.3 release.

Your team member (and customers) had to tell you that the CUDA paclet was downgraded with 11.3 release and myself and others will tell you that 11.3 causes macOS to reboot with current NVIDIA drivers and CUDA support under current macOS 10.3.3. What kind testing does Wolfram do? In the past I was offered a chance to at least try beta release, not for 11.3. WFT?

Under prior release of MMA I could run CUDA function on this machine, yes rather limited CUDA function but it ran.... now it crashes my Mac! I am not here to pay and debug your production product! If you don't test it DON'T release it as production code!

NVIDIA Web Driver 387.10.10.10.25.161 (Up to date) Last checked was 03/15/18, 19:55

CUDA Driver Version 387.128

macOS High Sierra 10.13.3 (17D102) MacBook Pro (Retina, Mid 2012) NVIDIA GeForce GT 650M:

Chipset Model: NVIDIA GeForce GT 650M Type: GPU Bus: PCIe PCIe Lane Width: x8 VRAM (Total): 1 GB Vendor: NVIDIA (0x10de) Device ID: 0x0fd5 Revision ID: 0x00a2 ROM Revision: 3688 Automatic Graphics Switching: Supported gMux Version: 3.2.19 [3.2.8] Metal: Supported, feature set macOS GPUFamily1 v3

Mathematica 11.3.0.0

Just asking for System Information --or-- running this code

Needs["CUDALink`"] TextGrid[CUDAInformation[]]

cause macOS to black screen.

That's really bad!!!

POSTED BY: David Proffer

Sorry to say Sebastian, but you are really out of touch with what got delivered to your customers with 11.3 release.... Your team member (and customers) had to tell you that the CUDA paclet was downgraded with 11.3 release

I am one of the developers of the neural net framework. I have nothing to do with CUDALink (and as mentioned in this thread, CUDALink is not used by the neural net framework at all), and indeed I had no idea what issues CUDALink faced in 11.3.

I can however speak authoritatively about the neural net framework, which is the title of this post (ie neural net slowdowns in 11.3, for which the discussion seems to have ended). You are replying to a subthread where I claimed that this is fixed:

For quite some time now (i.e. even in 11.2), TargetDevice -> "GPU" works for me only if I load CUDALink first

This was a bug in the neural net framework, which I was involved with fixing. It seems your real issues are about CUDALink, for which I have nothing to add to the discussion.

Hi on Windows 10 It seems that 11.3 is not finding the NVidea and 11.2 does. I have two Video cards on this Zbook (main Intel and second NVidea)

MM11.3

Needs["CUDALink`"] 
TextGrid[CUDAInformation[]]

Gives CUDAInformation::invdevnm: CUDA is not supported on device Intel(R) HD Graphics 530. Refer to CUDALink System Requirements for system requirements.

With 11:2

TextGrid[{1 -> {"Name" -> "Quadro M1000M", "Clock Rate" -> 1071500, "Compute Capabilities" -> 5., "GPU Overlap" -> 1, "Maximum Block Dimensions" -> {1024, 1024, 64}, "Maximum Grid Dimensions" -> {2147483647, 65535, 65535}, "Maximum Threads Per Block" -> 1024, "Maximum Shared Memory Per Block" -> 49152, "Total Constant Memory" -> 65536, "Warp Size" -> 32, "Maximum Pitch" -> 2147483647, "Maximum Registers Per Block" -> 65536, "Texture Alignment" -> 512, "Multiprocessor Count" -> 4, "Core Count" -> 128, "Execution Timeout" -> 1, "Integrated" -> False, "Can Map Host Memory" -> True, "Compute Mode" -> "Default", "Texture1D Width" -> 65536, "Texture2D Width" -> 65536, "Texture2D Height" -> 65536, "Texture3D Width" -> 4096, "Texture3D Height" -> 4096, "Texture3D Depth" -> 4096, "Texture2D Array Width" -> 16384, "Texture2D Array Height" -> 16384, "Texture2D Array Slices" -> 2048, "Surface Alignment" -> 512, "Concurrent Kernels" -> True, "ECC Enabled" -> False,

"TCC Enabled" -> False, "Total Memory" -> 2147483648}}]

11.2 11.3

POSTED BY: l van Veen

The issue I described above is fixed by upgrading the NVidia driver to version 391.03. This release was mentioned by Nettrain (387+) that didn't work in MM11.3 but did work with 11.2. Windows claimed the driver was up to date but the NVidia site showed newer versions.

POSTED BY: l van Veen

We have now released updated CUDALink paclets for Mathematica 11.3. Please run CUDAResourcesInstall[Update -> True] to get the new version. We apologize for the delay.

Posted 7 years ago

Thank you very much!

Finally CUDA 9.1 with Visual Studio 2017 compiler supported :-)

But CUDACCompilers is still like in Mathematica 11.2 not funtional. But in SytemInformation Toolkit Version 4. is listed. Should this not be 9.1 ? And the Core Count says 320 while my GeForce GTX1060 with Max-Q Design has 1280 cores!

POSTED BY: Bert Aerts

I'm not sure where SystemInformation is getting its "Tookit version" info but it's complete bogus. Please use the information returned by CUDAResourcesInformation[] instead.

The "core count" shown in SystemInformation is also misleading, but it doesn't have any effect on computations. I'll see if I can get this fixed.

Posted 7 years ago

One thing that should also be mentioned: Neural Nets in the WL don't use CUDALink` at all, so this issue has no effect on the neural net framework which uses CUDA 8 on OSX and CUDA 9.1 for Windows + Linux.

So CUDA 9.1 is present in the installation of Mathematica 11.3. Is this all precompiled code? Or do you need a C++ compiler like would be needed with the CUDA paclet?

POSTED BY: Bert Aerts

What about support for NVidia cards for option TargetDevice->"GPU" in such things as Classify and other neural net-related functions?

That is already supported out-of-the-box in 11.3.0.

Given the long-time status of Mathematica as essentially platform-independent, it seems really constricting that only CUDA with NVidia is supported whereas OpenCL with AMD is not.

That has more to do with CUDA being the de-facto standard for most of the state-of-the-art neural network libraries (like MXNet), which Mathematica uses under the hood. If the libraries we use add support for OpenCL then we'd certainly investigate adding support from our end.

We have now released updated CUDALink paclets for Mathematica 11.3. Please run CUDAResourcesInstall[Update -> True] to get the new version. We apologize for the delay.

Posted 7 years ago

After doing the update you describe Stefan, I am still not sure I have a configuration that works. I have attached 4 pictures that I hope give a view of what I am seeing. The mandelbulbGPU example from your documentation still fails. It still seems to be reporting: "Name" -> "CUDAResources", "Version" -> "10.5.0" This is on macOS 10.13.2 with MMA 11.3.0

Attachments:
POSTED BY: David Proffer

Try doing PacletSiteUpdate/@PacletSites[] and then try again. You should get a paclet that has version "11.3.51".

Posted 7 years ago

Thank you for the information Stefan, that seems to have updates the CUDA resources.

It is a rather painful process to get things stable and current, MMA does not offer clear steps and useful diagnostics.

POSTED BY: David Proffer
Posted 7 years ago

The new ImageRestyle function continues to fail however. Is this a problem with CUDA or a problem with Neural Network package? Stefan? Sebastian?

Attachments:
POSTED BY: David Proffer

Do you have the very latest Nvidia drivers installed? In System Preferences > CUDA does it say that a newer driver is available?

Posted 7 years ago

Both say they are current for my OS version macOS 11.13.2 (17C89). See pictures below:

Attachments:
POSTED BY: David Proffer

Hi, I'm using macOS 10.13.3 and I have the following results for Nvidia and CUDA:

enter image description here

enter image description here

POSTED BY: Conrad Taylor

Stefan, this solution worked for me. Also, I see the correct results when evaluating SystemInformation[]

POSTED BY: Conrad Taylor

As I run the exact same data science GPU code on identical hardware software configuration except for the change from Mathematica 11.2 to Mathematica 11.3 and see my neural network performance go from 295 seconds on 11.2 to 2038 seconds on 11.3.

When training the net, could you set the BatchSize and MaxTrainingRounds options to the same value in both versions, and then report the training time again? (we changed the heuristic for choosing a value of MaxTrainingRounds which sometimes uses more training rounds for extra accuracy. And: the inputs/s is a better indicator of speed than the total training time as it doesn't depend on MaxTrainingRounds). If there is still a speed difference, could you give information about the GPU you are using, the output of $Version and if possible the training script to reproduce this training?

And then I see that Mathematica 11.3 still does not support current XCode LLVM/GCC compiler or NVIDIA for CUDA tools

I am not sure exactly what you mean by this. Could you expand a bit about which functionality you are referring to that doesn't support "current XCode LLVM/GCC compiler or NVIDIA"?

Posted 7 years ago

Sebastian, thank you for the information. Yes, the BatchSize seems to default to a much smaller value in 11.3 from 11.2, raising it from the default of 256 to 700 gets the timing down to 247 seconds in my test example. The MaxTrainingRounds seems to be the same value of 10 between the versions of MMA. The 247 seconds on MMA 11.3 is about 50 seconds faster than the results on 11.2. Raising the BatchSize above 700 (until the kernel crashes around 900) does not yield any improvement in speed. Is there anyway MMA can detect the limits of the GPU and not just crash?

Still evaluating error results.

POSTED BY: David Proffer
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract