Group Abstract

Message Boards

WOLFRAM COMMUNITY

58.4K Views

93 Replies

49 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

CUDA not working on Mathematica 12.2

Jean-Michel Charles Collard-Richard

Posted 5 years ago

Hello all :) CUDA is not working on 12.2. Look at this: What I am supposed to do? I also tried by downloading the CUDA packlets: change nothing. I have an NVIDIA RTX 3090 and the drivers are ok. Thank you foe helping me. Regards to all, Jean-Michel Attachments: Sans titre-1.nb

POSTED BY: Jean-Michel Charles Collard-Richard

93 Replies

Sort By:

ZAQU zaqu

Posted 4 years ago

Okay so newest CUDA 11.6 added support for Visual Studio 2022. Nice. And it fixes the fatal error: ******************************************************************** \ Visual Studio 2022 Developer Command Prompt v17.0.6 Copyright (c) 2021 Microsoft Corporation ****************************************************************** \ [vcvarsall.bat] Environment initialized for: 'x64' C:\Program Files\NVIDIA GPU Computing \ Toolkit\CUDA\v11.5\include\crt/host_config.h(160): fatal error C1189: \ #error: -- unsupported Microsoft Visual Studio version! Only the \ versions between 2017 and 2019 (inclusive) are supported! The nvcc \ flag '-allow-unsupported-compiler' can be used to override this \ version check; however, using an unsupported host compiler may cause \ compilation failure or incorrect run time execution. Use at your own \ risk. CUDAFunction-9936.cu Everything works. ****************************************************************** \ Visual Studio 2022 Developer Command Prompt v17.0.6 Copyright (c) 2021 Microsoft Corporation ******************************************************************** \ [vcvarsall.bat] Environment initialized for: 'x64' CUDAFunction-4715.cu

Okay so newest CUDA 11.6 added support for Visual Studio 2022. Nice.

And it fixes the fatal error:

**********************************************************************
\

** Visual Studio 2022 Developer Command Prompt v17.0.6
** Copyright (c) 2021 Microsoft Corporation
**********************************************************************
\

[vcvarsall.bat] Environment initialized for: 'x64'
C:\Program Files\NVIDIA GPU Computing \
Toolkit\CUDA\v11.5\include\crt/host_config.h(160): fatal error C1189: \
#error:  -- unsupported Microsoft Visual Studio version! Only the \
versions between 2017 and 2019 (inclusive) are supported! The nvcc \
flag '-allow-unsupported-compiler' can be used to override this \
version check; however, using an unsupported host compiler may cause \
compilation failure or incorrect run time execution. Use at your own \
risk.
CUDAFunction-9936.cu

Everything works.

**********************************************************************
\

** Visual Studio 2022 Developer Command Prompt v17.0.6
** Copyright (c) 2021 Microsoft Corporation
**********************************************************************
\

[vcvarsall.bat] Environment initialized for: 'x64'
CUDAFunction-4715.cu

POSTED BY: ZAQU zaqu

ZAQU zaqu

Posted 4 years ago

So it does mostly work except for CUDAFluidDynamics (apparently it also works if you will enable dynamics beforehand and also CUDAVolumetricRender works https://mathematica.stackexchange.com/a/22654 ). See: https://www.wolframcloud.com/obj/13f45cd5-6d93-4bab-8c82-e0e08d824984

POSTED BY: ZAQU zaqu

ZAQU zaqu

Posted 4 years ago

Is using Visual studio 2022 compiler going to be a problem? P.S. That was a problem had to force 2019 compiler. Alas.

POSTED BY: ZAQU zaqu

ZAQU zaqu

Posted 4 years ago

I hope installing CUDA toolkit 11.5 update 1 will not going to break anything.

POSTED BY: ZAQU zaqu

Jay Morreale

Jay Morreale, p-brane LLC

Posted 5 years ago

CUDALink stopped working for me since around Mathematica 12 or so. I've recently upgraded to Windows 10 21H1 and Mathematica 12.3.1. CUDA Toolkit 10.2 was uninstalled and CUDA Toolkit 11.4 was installed. CUDALink still did not work. I found that the CUDA Toolkit 10.2 uninstall leaves the empty directory C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin Once I deleted it, Needs["CUDALink`"] finds the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin directory and the CUDA functions like CUDAToolkitCompatibilityInformation[], CUDAQ[] (returns true), SystemInformation[] (shows driver and GPU status), CUDAInformation[], and CUDADriverVersion[] work as documented. I had also created and set CUDA_PATH to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4 because CUDA Toolkit 11.4 creates CUDA_PATH_V11_4 instead. This did not seem to help Mathematica find the correct toolkit when Needs["CUDALink`"] is evaluated. It worked only when the empty v10.4\bin directory was removed.

POSTED BY: Jay Morreale

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 5 years ago

Hi Greg, No worries. Thanks for the heads-up on the 11.3 CUDA toolkit. The above now works on my machine also. However, it still isn't using GPU, even when you specify TargetDevice -> "GPU". It just reverts to using CPU, as in other cases. Jonathan

POSTED BY: Jonathan Kinlay

Gregory K

Posted 5 years ago

FYI, Further exploration revealed that some training tasks perform AMAZINGLY fast. Especially computer vision tasks where ConvolutionLayer is used. I am getting 3-5 seconds on GPU vs 3-4 MINUTES on my 3960X CPU for the MNIST example in the help file. Image classification example performs similarly well, with batch and round tuning to optimize performance time for similar quality resulting trained net. But as soon as we go back to simple math and vector multiplications, which should have used tensor cores - I suspect that's where NetTrain fails to understand how to use the GPU and switches back to the CPU utilization, hence the atrocious speeds and high CPU utilization numbers.

POSTED BY: Gregory K

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 5 years ago

Gregory, Do you want to post (a link to) some WL code that I can benchmark on my Ryzen 3090 machine, to confirm your findings? Jonathan

POSTED BY: Jonathan Kinlay

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 5 years ago

Gregory Assuming you are referring to the example below, I am unable to corroborate. Its faster than CPU, but still takes over 1 minute to train. Task Manager reports minimal GPU load on my machine during the evaluation.

POSTED BY: Jonathan Kinlay

Gregory K

Posted 5 years ago

Jonathan, depending on batch size, I've had the GPU crunch over 100k samples per second, but 85-90k is typical. 3-5s GPU time is typical, 5-6MIN CPU time is typical. I'm running an EVGA RTX 3090 air cooled at 1995Mhz stable OC. My RAM is 3600Mhz, which probably also helps. There's about 35-37% CPU utilization still showing during those 3-4 seconds of GPU training. -Greg Attachments: CUDA Screenshot ...png

POSTED BY: Gregory K

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 5 years ago

You're right!

POSTED BY: Jonathan Kinlay

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 5 years ago

Here's another GPU fail, this time with AnomalyDetection. The error message may shed some light on the cause of the issue. It seems to be a missing dependency... Are you able to replicate this Greg?

POSTED BY: Jonathan Kinlay

Gregory K

Posted 5 years ago

Jonathan, Sorry for the late reply, I didn't see your comment and then got pretty busy. My PC succeeds with your code. See the attached image. I noticed you're on CUDA 11.2. I'm on 11.3. Maybe that's it? -Greg Attachments: CUDA Screenshot ...png CUDA Screenshot ...png

POSTED BY: Gregory K

Gregory K

Posted 5 years ago

Stefan, I updated the latest driver and toolkit, so I am now on 466.11 and 11.3, all latest paclets from Wolfram, NetTrain works.... kinda... 3090 core count isn't being recognized, max BatchSize is limited to 10000 batches, and when I run NetTrain, it shows my GPU utilization at 10-15% at all times. So I'm using a 10th of the CUDA power I have available. NetTrain on CPU correctly shows 98-99% utilization, though seems like 10k batches is still too small even for CPU, given the very fast RAM and PCIe4 bus speeds. Most curiously, I'm showing +/-50% CPU utilization while NetTrain is targeted to use GPU! Why? Either way, 16s GPU time vs 19-20s CPU time is absolutely a TERRIBLE result, given the power of this GPU. Attachments: CUDA Screenshot ...png

POSTED BY: Gregory K

Updating Name

Posted 5 years ago

I have a PC with a similar configuration to Gregory's: Aurora-Ryzen with GEForce 3090 GPU. I get similar results for the same test - GPU utilization maxes out at around 10%-15% with CPU ultilzation at 50%. It seems that Mathematica is seriously under-utilizing the GPU (and CPU), as Gregory reported. Other mathematical programming languages I work with have no difficulty making full use for both CPU and GPU for tasks like NetTrain. Someone at Wolfram needs to get a grip on GPU computing functionality and work on resolving incompatibility and performance issues that are a serious impediment to conducting ML R&D in Mathematica. I realize that some of these issues are outside Wolfram's control - for example when NVIDIA discontinues support for older GPUs - but there are ways to handle the issues much more effectively, as evidenced by the very comprehensive, up-to-date documentation offered by some competitor products.

POSTED BY: Updating Name

Michal Kvasnicka

Posted 5 years ago

The current situation of Mathematica CUDA support for recent NVIDIA GPUs is really terrible. Try to compare with recent MATLAB R2021a support, where overall CUDA performance on similar tasks (discussed here) is far more better. Wolfram Research should really solve this situation as high priority task!!! Especially in a case of new Apple M1 HW, where the NVIDIA GPUs are completely discontinued.

POSTED BY: Michal Kvasnicka

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 5 years ago

Like I said, I don't hold Wolfram accountable for the (sometimes questionable) decisions that NVIDIA makes about ongoing support for legacy (and even some new) GPUs. But, by contrast to Wolfram's spotty handling of GPU functionality, products such as Matlab: (i) Flag upcoming changes to GPU support by NVIDIA in the core documentation before they happen (ii) Present detailed tables of the GPUs supported by each software version, their performance characteristics and required drivers/toolkits (iii) Provide support for the latest GPUs with each new software release and ensure that their performance is in line with expectations (iv) Ensure that existing GPU functions continue to work as advertised in each new software release Wolfram does none of these things, apparently, which gives the impression that WR doesn't really care about GPU functionality in their products. As I say, GPU capability is often critical for machine learning applications. And in general, WL machine learning functionality appears to be lagging competitor offerings (including R and Python libraries) in several important areas. This is not an un-fixable problem. The same was true of, for example, time series functionality prior to version 10, but which has improved enormously in later versions to a point where it is now outstanding.

POSTED BY: Jonathan Kinlay

Monideepa Gupta

Posted 5 years ago

POSTED BY: Monideepa Gupta

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

Your problem is separate from most of the discussion here, which is specific to changes and updates made in Mathematica 12.2.0. Please contact Wolfram Technical Support (https://www.wolfram.com/support/contact/email/?topic=technical), and include all of these details.

POSTED BY: Stefan Ragnarsson

Kay Herbert

Kay Herbert, Chief scientist, Sloan Valve Co.

Posted 5 years ago

I just installed Cuda from Nvidia on my Windows 10 Lenovo X1 no problem: In[1]:= Needs["CUDALink`"] In[2]:= CUDAInformation[] Out[2]= {1 -> {"Name" -> "GeForce GTX 1650 with Max-Q Design", "Clock Rate" -> 1245000, "Compute Capabilities" -> 7.5, "GPU Overlap" -> 1, "Maximum Block Dimensions" -> {1024, 1024, 64}, "Maximum Grid Dimensions" -> {2147483647, 65535, 65535}, "Maximum Threads Per Block" -> 1024, "Maximum Shared Memory Per Block" -> 49152, "Total Constant Memory" -> 65536, "Warp Size" -> 32, "Maximum Pitch" -> 2147483647, "Maximum Registers Per Block" -> 65536, "Texture Alignment" -> 512, "Multiprocessor Count" -> 16, "Core Count" -> 1024, "Execution Timeout" -> 1, "Integrated" -> False, "Can Map Host Memory" -> True, "Compute Mode" -> "Default", "Texture1D Width" -> 131072, "Texture2D Width" -> 131072, "Texture2D Height" -> 65536, "Texture3D Width" -> 16384, "Texture3D Height" -> 16384, "Texture3D Depth" -> 16384, "Texture2D Array Width" -> 32768, "Texture2D Array Height" -> 32768, "Texture2D Array Slices" -> 2048, "Surface Alignment" -> 512, "Concurrent Kernels" -> True, "ECC Enabled" -> False, "TCC Enabled" -> False, "Total Memory" -> 4294967296}} In[3]:= InstallCUDA[] Out[3]= Success["CUDALinkLoaded", Association[ "MessageTemplate" :> "CUDALink installation complete.", "CUDAVersion" -> 11.2, "DefaultDevice" -> "GeForce GTX 1650 with Max-Q Design", "Toolkit" -> "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\ \\v11.2", "NVCC" -> "C:\\Program Files\\NVIDIA GPU Computing \ Toolkit\\CUDA\\v11.2\\bin\\nvcc.exe", "LibrariesLoaded" -> { "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.2\\bin\ \\cublas64_11.dll", "C:\\Program Files\\NVIDIA GPU Computing \ Toolkit\\CUDA\\v11.2\\bin\\cublasLt64_11.dll", "C:\\Program Files\\NVIDIA GPU Computing \ Toolkit\\CUDA\\v11.2\\bin\\cudart64_110.dll", "C:\\Program Files\\NVIDIA GPU Computing \ Toolkit\\CUDA\\v11.2\\bin\\cufft64_10.dll", "C:\\Program Files\\NVIDIA GPU Computing \ Toolkit\\CUDA\\v11.2\\bin\\cufftw64_10.dll", "C:\\Program Files\\NVIDIA GPU Computing \ Toolkit\\CUDA\\v11.2\\bin\\curand64_10.dll", "C:\\Windows\\System32\\nvcuda.dll"}]] In[4]:= CUDADot[Table[i, {i, 10}, {j, 10}], Table[i, {i, 10}, {j, 10}]] // MatrixForm Out[4]//MatrixForm= \!\( TagBox[ RowBox[{"(", "", GridBox[{ {"55", "55", "55", "55", "55", "55", "55", "55", "55", "55"}, {"110", "110", "110", "110", "110", "110", "110", "110", "110", "110"}, {"165", "165", "165", "165", "165", "165", "165", "165", "165", "165"}, {"220", "220", "220", "220", "220", "220", "220", "220", "220", "220"}, {"275", "275", "275", "275", "275", "275", "275", "275", "275", "275"}, {"330", "330", "330", "330", "330", "330", "330", "330", "330", "330"}, {"385", "385", "385", "385", "385", "385", "385", "385", "385", "385"}, {"440", "440", "440", "440", "440", "440", "440", "440", "440", "440"}, {"495", "495", "495", "495", "495", "495", "495", "495", "495", "495"}, {"550", "550", "550", "550", "550", "550", "550", "550", "550", "550"} }, GridBoxAlignment->{"Columns" -> {{Center}}, "Rows" -> {{Baseline}}}, GridBoxSpacings->{"Columns" -> { Offset[0.27999999999999997`], { Offset[0.7]}, Offset[0.27999999999999997`]}, "Rows" -> { Offset[0.2], { Offset[0.4]}, Offset[0.2]}}], "", ")"}], Function[BoxForm`e$, MatrixForm[BoxForm`e$]]]\)

I just installed Cuda from Nvidia on my Windows 10 Lenovo X1 no problem:

 In[1]:= Needs["CUDALink`"]

In[2]:= CUDAInformation[]

Out[2]= {1 -> {"Name" -> "GeForce GTX 1650 with Max-Q Design", 
   "Clock Rate" -> 1245000, "Compute Capabilities" -> 7.5, 
   "GPU Overlap" -> 1, "Maximum Block Dimensions" -> {1024, 1024, 64},
    "Maximum Grid Dimensions" -> {2147483647, 65535, 65535}, 
   "Maximum Threads Per Block" -> 1024, 
   "Maximum Shared Memory Per Block" -> 49152, 
   "Total Constant Memory" -> 65536, "Warp Size" -> 32, 
   "Maximum Pitch" -> 2147483647, 
   "Maximum Registers Per Block" -> 65536, "Texture Alignment" -> 512,
    "Multiprocessor Count" -> 16, "Core Count" -> 1024, 
   "Execution Timeout" -> 1, "Integrated" -> False, 
   "Can Map Host Memory" -> True, "Compute Mode" -> "Default", 
   "Texture1D Width" -> 131072, "Texture2D Width" -> 131072, 
   "Texture2D Height" -> 65536, "Texture3D Width" -> 16384, 
   "Texture3D Height" -> 16384, "Texture3D Depth" -> 16384, 
   "Texture2D Array Width" -> 32768, 
   "Texture2D Array Height" -> 32768, 
   "Texture2D Array Slices" -> 2048, "Surface Alignment" -> 512, 
   "Concurrent Kernels" -> True, "ECC Enabled" -> False, 
   "TCC Enabled" -> False, "Total Memory" -> 4294967296}}

In[3]:= InstallCUDA[]

Out[3]= Success["CUDALinkLoaded", 
Association[
 "MessageTemplate" :> "CUDALink installation complete.", 
  "CUDAVersion" -> 11.2, 
  "DefaultDevice" -> "GeForce GTX 1650 with Max-Q Design", 
  "Toolkit" -> "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\
\\v11.2", 
  "NVCC" -> "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\nvcc.exe", 
  "LibrariesLoaded" -> {
   "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.2\\bin\
\\cublas64_11.dll", 
    "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\cublasLt64_11.dll", 
    "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\cudart64_110.dll", 
    "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\cufft64_10.dll", 
    "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\cufftw64_10.dll", 
    "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\curand64_10.dll", 
    "C:\\Windows\\System32\\nvcuda.dll"}]]

In[4]:= CUDADot[Table[i, {i, 10}, {j, 10}], 
  Table[i, {i, 10}, {j, 10}]] // MatrixForm


Out[4]//MatrixForm= \!\(
TagBox[
RowBox[{"(", "", GridBox[{
{"55", "55", "55", "55", "55", "55", "55", "55", "55", "55"},
{"110", "110", "110", "110", "110", "110", "110", "110", "110", "110"},
{"165", "165", "165", "165", "165", "165", "165", "165", "165", "165"},
{"220", "220", "220", "220", "220", "220", "220", "220", "220", "220"},
{"275", "275", "275", "275", "275", "275", "275", "275", "275", "275"},
{"330", "330", "330", "330", "330", "330", "330", "330", "330", "330"},
{"385", "385", "385", "385", "385", "385", "385", "385", "385", "385"},
{"440", "440", "440", "440", "440", "440", "440", "440", "440", "440"},
{"495", "495", "495", "495", "495", "495", "495", "495", "495", "495"},
{"550", "550", "550", "550", "550", "550", "550", "550", "550", "550"}
},
GridBoxAlignment->{"Columns" -> {{Center}}, "Rows" -> {{Baseline}}},
GridBoxSpacings->{"Columns" -> {
Offset[0.27999999999999997`], {
Offset[0.7]}, 
Offset[0.27999999999999997`]}, "Rows" -> {
Offset[0.2], {
Offset[0.4]}, 
Offset[0.2]}}], "", ")"}],
Function[BoxForm`e$, 
MatrixForm[BoxForm`e$]]]\)

POSTED BY: Kay Herbert

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 5 years ago

On the iMACs running Windows 10 and Mathematica 12.2 with CUDA tool 11.2 installed, we have: Needs["CUDALink`"] CUDAInformation[] {1->{Name->GeForce GTX 660M,Clock Rate->950000,Compute Capabilities->3.,GPU Overlap->1,Maximum Block Dimensions->{1024,1024,64},Maximum Grid Dimensions->{2147483647,65535,65535},Maximum Threads Per Block->1024,Maximum Shared Memory Per Block->49152,Total Constant Memory->65536,Warp Size->32,Maximum Pitch->2147483647,Maximum Registers Per Block->65536,Texture Alignment->512,Multiprocessor Count->2,Core Count->384,Execution Timeout->1,Integrated->False,Can Map Host Memory->True,Compute Mode->Default,Texture1D Width->65536,Texture2D Width->65536,Texture2D Height->65536,Texture3D Width->4096,Texture3D Height->4096,Texture3D Depth->4096,Texture2D Array Width->16384,Texture2D Array Height->16384,Texture2D Array Slices->2048,Surface Alignment->512,Concurrent Kernels->True,ECC Enabled->False,TCC Enabled->False,Total Memory->536870912}} InstallCUDA[] Success[] Then: CUDADot[Table[i,{i,10},{j,10}],Table[i,{i,10},{j,10}]]//MatrixForm CUDADot::notinit: CUDALink is not initialized.

On the iMACs running Windows 10 and Mathematica 12.2 with CUDA tool 11.2 installed, we have:

 Needs["CUDALink`"]
 CUDAInformation[]

 {1->{Name->GeForce GTX 660M,Clock Rate->950000,Compute Capabilities->3.,GPU Overlap->1,Maximum Block Dimensions->{1024,1024,64},Maximum Grid Dimensions->{2147483647,65535,65535},Maximum Threads Per Block->1024,Maximum Shared Memory Per Block->49152,Total Constant Memory->65536,Warp Size->32,Maximum Pitch->2147483647,Maximum Registers Per Block->65536,Texture Alignment->512,Multiprocessor Count->2,Core Count->384,Execution Timeout->1,Integrated->False,Can Map Host Memory->True,Compute Mode->Default,Texture1D Width->65536,Texture2D Width->65536,Texture2D Height->65536,Texture3D Width->4096,Texture3D Height->4096,Texture3D Depth->4096,Texture2D Array Width->16384,Texture2D Array Height->16384,Texture2D Array Slices->2048,Surface Alignment->512,Concurrent Kernels->True,ECC Enabled->False,TCC Enabled->False,Total Memory->536870912}}

 InstallCUDA[]
 Success[]

Then:

 CUDADot[Table[i,{i,10},{j,10}],Table[i,{i,10},{j,10}]]//MatrixForm

 CUDADot::notinit: CUDALink is not initialized.

POSTED BY: Jonathan Kinlay

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 5 years ago

NB: CUDA no longer works on older iMACs as the graphics card support for the GTX 660 is limited to CUDA ver 10.1 by NVIDIA. You would have to revert to Mathematica V11.x to run GPU functionality on one of these cards.

POSTED BY: Jonathan Kinlay

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

Update : As of today, neural network operations (such as `NetTrain`) with `TargetDevice->"GPU"` should now work with Ampere-generation cards from Nvidia, e.g. RTX 3070, 3080 or 3090.

POSTED BY: Stefan Ragnarsson

Fidel Schaposnik

Posted 5 years ago

As of today, my previously working setup (Mathematica 12.2.0.0 + GeForce GTX 1060 6GB on Windows 10) has stopped working after an update from the Wolfram servers replaced the MXNetLink and MXNetResources paclets I had before. CUDA seems to work fine (CUDAQ[] gives True, InstallCUDA[] is successful, CUDADot[] example works but CUDAFourier doesn't, probably due to the unrelated issue), but I can't run any neural network functions at this point. Any suggestions?

POSTED BY: Fidel Schaposnik

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

What happens for you with `NetTrain` with `TargetDevice->"GPU"`, are there any error messages shown? I have a similar setup (Win 10, GTX 1060, Mathematica 12.2.0) and it appears to work. Are your graphics drivers up-to-date?

POSTED BY: Stefan Ragnarsson

Fidel Schaposnik

Posted 5 years ago

It all depends a somewhat on the order in which things are done, but here's some more information: Running << CUDALink` CUDAQ[] gives True, then CudaInformation[] recognizes the GPU {1 -> {"Name" -> "GeForce GTX 1060 6GB", "Clock Rate" -> 1708500, "Compute Capabilities" -> 6.1, "GPU Overlap" -> 1, "Maximum Block Dimensions" -> {1024, 1024, 64}, "Maximum Grid Dimensions" -> {2147483647, 65535, 65535}, "Maximum Threads Per Block" -> 1024, "Maximum Shared Memory Per Block" -> 49152, "Total Constant Memory" -> 65536, "Warp Size" -> 32, "Maximum Pitch" -> 2147483647, "Maximum Registers Per Block" -> 65536, "Texture Alignment" -> 512, "Multiprocessor Count" -> 10, "Core Count" -> 1280, "Execution Timeout" -> 1, "Integrated" -> False, "Can Map Host Memory" -> True, "Compute Mode" -> "Default", "Texture1D Width" -> 131072, "Texture2D Width" -> 131072, "Texture2D Height" -> 65536, "Texture3D Width" -> 16384, "Texture3D Height" -> 16384, "Texture3D Depth" -> 16384, "Texture2D Array Width" -> 32768, "Texture2D Array Height" -> 32768, "Texture2D Array Slices" -> 2048, "Surface Alignment" -> 512, "Concurrent Kernels" -> True, "ECC Enabled" -> False, "TCC Enabled" -> False, "Total Memory" -> 6442450944}} CUDADriverVersion[] identifies my drivers as version 465.21, which is correct (note that these are beta drivers needed to run Docker with GPU support on WSL 2, as explained here https://www.docker.com/blog/wsl-2-gpu-support-is-here/ ; they were working just fine with Mathematica up until yesterday). After doing all of this, NetTrain trains fine on CPU but fails on GPU with the error NetTrain::badtrgdevgpu: TargetDevice -> GPU could not be used. Please ensure that you have a compatible NVIDIA graphics card and have installed the latest drivers from http://www.nvidia.com/Download/index.aspx. Strangely enough, if I now try CUDAQ[] I get a pop-up error box saying (note the DLL file is certainly there): The procedure entry point cufftloadwisdom could not be located in the dynamic link library C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\cufftw64_10.dll. Then afterwards, even when running neural network functions on the CPU, for example before NetTrain I get (again, the DLL file is present) LibraryFunction::load: The library C:\Users\fidel\AppData\Roaming\Mathematica\Paclets\Repository\MXNetResources-WIN64-12.2.404\LibraryResources\Windows-x86-64\cublas64_11.dll cannot be loaded. and also a pop-up error box saying The procedure entry point cublasLtZZZMatmulAlgoGetHeuristic could not be located in the dynamic link library C:\Users\fidel\AppData\Roaming\Mathematica\Paclets\Repository\MXNetResources-WIN64-12.2.404\LibraryResources\Windows-x86-64\cublas64_11.dll

It all depends a somewhat on the order in which things are done, but here's some more information:

Running

<< CUDALink`
CUDAQ[]

gives True, then CudaInformation[] recognizes the GPU

{1 -> {"Name" -> "GeForce GTX 1060 6GB", "Clock Rate" -> 1708500,
"Compute Capabilities" -> 6.1, "GPU Overlap" -> 1,
"Maximum Block Dimensions" -> {1024, 1024, 64},
"Maximum Grid Dimensions" -> {2147483647, 65535, 65535},
"Maximum Threads Per Block" -> 1024,
"Maximum Shared Memory Per Block" -> 49152,
"Total Constant Memory" -> 65536, "Warp Size" -> 32,
"Maximum Pitch" -> 2147483647,
"Maximum Registers Per Block" -> 65536, "Texture Alignment" -> 512,
"Multiprocessor Count" -> 10, "Core Count" -> 1280,
"Execution Timeout" -> 1, "Integrated" -> False,
"Can Map Host Memory" -> True, "Compute Mode" -> "Default",
"Texture1D Width" -> 131072, "Texture2D Width" -> 131072,
"Texture2D Height" -> 65536, "Texture3D Width" -> 16384,
"Texture3D Height" -> 16384, "Texture3D Depth" -> 16384,
"Texture2D Array Width" -> 32768,
"Texture2D Array Height" -> 32768,
"Texture2D Array Slices" -> 2048, "Surface Alignment" -> 512,
"Concurrent Kernels" -> True, "ECC Enabled" -> False,
"TCC Enabled" -> False, "Total Memory" -> 6442450944}}

CUDADriverVersion[] identifies my drivers as version 465.21, which is correct (note that these are beta drivers needed to run Docker with GPU support on WSL 2, as explained here https://www.docker.com/blog/wsl-2-gpu-support-is-here/ ; they were working just fine with Mathematica up until yesterday).

After doing all of this, NetTrain trains fine on CPU but fails on GPU with the error

NetTrain::badtrgdevgpu: TargetDevice -> GPU could not be used. Please ensure that you have a compatible NVIDIA graphics card and have installed the latest drivers from http://www.nvidia.com/Download/index.aspx.

Strangely enough, if I now try CUDAQ[] I get a pop-up error box saying (note the DLL file is certainly there):

The procedure entry point cufftloadwisdom could not be located in the dynamic link library C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\cufftw64_10.dll.

Then afterwards, even when running neural network functions on the CPU, for example before NetTrain I get (again, the DLL file is present)

LibraryFunction::load: The library C:\Users\fidel\AppData\Roaming\Mathematica\Paclets\Repository\MXNetResources-WIN64-12.2.404\LibraryResources\Windows-x86-64\cublas64_11.dll cannot be loaded.

and also a pop-up error box saying

The procedure entry point cublasLtZZZMatmulAlgoGetHeuristic could not be located in the dynamic link library C:\Users\fidel\AppData\Roaming\Mathematica\Paclets\Repository\MXNetResources-WIN64-12.2.404\LibraryResources\Windows-x86-64\cublas64_11.dll

POSTED BY: Fidel Schaposnik

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

The update we pushed (MXNetLink and MXNetResources) updated the NetTrain GPU implementation to CUDA Toolkit 11.2, which was necessary to add support for the latest-generation Nvidia cards. However, it appears that CUDA Toolkit 11.2 is incompatible with the beta driver you're using, 465.21. It appears Nvidia released an update yesterday that should help you: https://forums.developer.nvidia.com/t/new-cuda-on-wsl2-wip-driver-465-42-is-now-available-for-download/167166 We are pleased to inform you that WSL2 WIP driver 465.42 is now available for download. CUDA 11.2 toolkit will be functional for WSL v2 with this release. The soul of this driver release are some performance improvements we have made. Please let us know below what you think!

POSTED BY: Stefan Ragnarsson

Fidel Schaposnik

Posted 5 years ago

Thanks for the driver update suggestion! Mathematica is now back to working with the GPU, after I: 1) Uninstalled CUDA Toolkit v11.0 and installed v11.2 2) Updated the driver to v465.42 3) Uninstalled and reinstalled latest Mathematica paclets for MXNetLink and MXNResources (not sure how much this is necessary, I did it only because I was trying before to roll-back the updates) Note that skipping step (1) above and keeping CUDA Toolkit 11.0 resulted in a working set-up, but I still got an error when running neural net computations after loading CUDALink (or the other way around). I'm guessing this was due to CUDALink and MXNLink being confused about toolkit versions, so I just updated everything to run on v11.2... but note that TensorFlow doesn't play nicely with v11.2, so this required extra tinkering on the side.

POSTED BY: Fidel Schaposnik

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 5 years ago

Where I am working we have 1) a Win 10 pc with GEForce GTX 1060 graphics card that works fine with CUDA on Mathematica 12.2 2) a Win 10 PC with two old GTX 660 cards that will only run CUDA on Mathematica 12.1 and not with Mathematica 12.2 (regardless of whether NVIDIA toolkit v10 or v11 is installed) 3) Two iMac running Win 10 with GTX 675 cards, neither of which will run CUDA with either Mathematica 12.1 or Mathematica 12.2 (earlier versions of Mathematica 11 work fine with CUDA) Besides these kinds of inconsistencies various of the CUDA functions don't work as advertised, for example the CUDAFinancialDerivative with AsianArithmetic options, one of the examples in the documentation, just returns infinite values. It's an unreliable mess.

POSTED BY: Jonathan Kinlay

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

What is the output or error messages you're seeing in your setups that aren't working?

POSTED BY: Stefan Ragnarsson

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 5 years ago

On the Win 10 PC with two GTX 660 cards and Mathematica 12.2: Needs["CUDALink`"] CUDAInformation[] {1 -> {"Name" -> "GeForce GTX 660", "Clock Rate" -> 888500, "Compute Capabilities" -> 3., "GPU Overlap" -> 1, "Maximum Block Dimensions" -> {1024, 1024, 64}, "Maximum Grid Dimensions" -> {2147483647, 65535, 65535}, "Maximum Threads Per Block" -> 1024, "Maximum Shared Memory Per Block" -> 49152, "Total Constant Memory" -> 65536, "Warp Size" -> 32, "Maximum Pitch" -> 2147483647, "Maximum Registers Per Block" -> 65536, "Texture Alignment" -> 512, "Multiprocessor Count" -> 6, "Core Count" -> 1152, "Execution Timeout" -> 1, "Integrated" -> False, "Can Map Host Memory" -> True, "Compute Mode" -> "Default", "Texture1D Width" -> 65536, "Texture2D Width" -> 65536, "Texture2D Height" -> 65536, "Texture3D Width" -> 4096, "Texture3D Height" -> 4096, "Texture3D Depth" -> 4096, "Texture2D Array Width" -> 16384, "Texture2D Array Height" -> 16384, "Texture2D Array Slices" -> 2048, "Surface Alignment" -> 512, "Concurrent Kernels" -> True, "ECC Enabled" -> False, "TCC Enabled" -> False, "Total Memory" -> 1610612736}, 2 -> {"Name" -> "GeForce GTX 660", "Clock Rate" -> 888500, "Compute Capabilities" -> 3., "GPU Overlap" -> 1, "Maximum Block Dimensions" -> {1024, 1024, 64}, "Maximum Grid Dimensions" -> {2147483647, 65535, 65535}, "Maximum Threads Per Block" -> 1024, "Maximum Shared Memory Per Block" -> 49152, "Total Constant Memory" -> 65536, "Warp Size" -> 32, "Maximum Pitch" -> 2147483647, "Maximum Registers Per Block" -> 65536, "Texture Alignment" -> 512, "Multiprocessor Count" -> 6, "Core Count" -> 1152, "Execution Timeout" -> 1, "Integrated" -> False, "Can Map Host Memory" -> True, "Compute Mode" -> "Default", "Texture1D Width" -> 65536, "Texture2D Width" -> 65536, "Texture2D Height" -> 65536, "Texture3D Width" -> 4096, "Texture3D Height" -> 4096, "Texture3D Depth" -> 4096, "Texture2D Array Width" -> 16384, "Texture2D Array Height" -> 16384, "Texture2D Array Slices" -> 2048, "Surface Alignment" -> 512, "Concurrent Kernels" -> True, "ECC Enabled" -> False, "TCC Enabled" -> False, "Total Memory" -> 1610612736}} Then: CUDADot[Table[i, {i, 10}, {j, 10}], Table[i, {i, 10}, {j, 10}]] // MatrixForm CUDADot::allocf: A CUDALink memory allocation failed. Also: numberOfOptions = 32; spotPrices = RandomReal[{25.0, 35.0}, numberOfOptions]; strikePrices = RandomReal[{20.0, 40.0}, numberOfOptions]; expiration = RandomReal[{0.1, 10.0}, numberOfOptions]; interest = 0.08; volatility = RandomReal[{0.10, 0.50}, numberOfOptions]; dividend = RandomReal[{0.2, 0.06}, numberOfOptions]; CUDAFinancialDerivative[{"American", "Call"}, {"StrikePrice" -> strikePrices, "Expiration" -> expiration}, {"CurrentPrice" -> spotPrices, "InterestRate" -> interest, "Volatility" -> volatility, "Dividend" -> dividend}] {0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., \ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.}

On the Win 10 PC with two GTX 660 cards and Mathematica 12.2:

Needs["CUDALink`"]
CUDAInformation[]
{1 -> {"Name" -> "GeForce GTX 660", "Clock Rate" -> 888500, 
   "Compute Capabilities" -> 3., "GPU Overlap" -> 1, 
   "Maximum Block Dimensions" -> {1024, 1024, 64}, 
   "Maximum Grid Dimensions" -> {2147483647, 65535, 65535}, 
   "Maximum Threads Per Block" -> 1024, 
   "Maximum Shared Memory Per Block" -> 49152, 
   "Total Constant Memory" -> 65536, "Warp Size" -> 32, 
   "Maximum Pitch" -> 2147483647, 
   "Maximum Registers Per Block" -> 65536, "Texture Alignment" -> 512,
    "Multiprocessor Count" -> 6, "Core Count" -> 1152, 
   "Execution Timeout" -> 1, "Integrated" -> False, 
   "Can Map Host Memory" -> True, "Compute Mode" -> "Default", 
   "Texture1D Width" -> 65536, "Texture2D Width" -> 65536, 
   "Texture2D Height" -> 65536, "Texture3D Width" -> 4096, 
   "Texture3D Height" -> 4096, "Texture3D Depth" -> 4096, 
   "Texture2D Array Width" -> 16384, 
   "Texture2D Array Height" -> 16384, 
   "Texture2D Array Slices" -> 2048, "Surface Alignment" -> 512, 
   "Concurrent Kernels" -> True, "ECC Enabled" -> False, 
   "TCC Enabled" -> False, "Total Memory" -> 1610612736}, 
 2 -> {"Name" -> "GeForce GTX 660", "Clock Rate" -> 888500, 
   "Compute Capabilities" -> 3., "GPU Overlap" -> 1, 
   "Maximum Block Dimensions" -> {1024, 1024, 64}, 
   "Maximum Grid Dimensions" -> {2147483647, 65535, 65535}, 
   "Maximum Threads Per Block" -> 1024, 
   "Maximum Shared Memory Per Block" -> 49152, 
   "Total Constant Memory" -> 65536, "Warp Size" -> 32, 
   "Maximum Pitch" -> 2147483647, 
   "Maximum Registers Per Block" -> 65536, "Texture Alignment" -> 512,
    "Multiprocessor Count" -> 6, "Core Count" -> 1152, 
   "Execution Timeout" -> 1, "Integrated" -> False, 
   "Can Map Host Memory" -> True, "Compute Mode" -> "Default", 
   "Texture1D Width" -> 65536, "Texture2D Width" -> 65536, 
   "Texture2D Height" -> 65536, "Texture3D Width" -> 4096, 
   "Texture3D Height" -> 4096, "Texture3D Depth" -> 4096, 
   "Texture2D Array Width" -> 16384, 
   "Texture2D Array Height" -> 16384, 
   "Texture2D Array Slices" -> 2048, "Surface Alignment" -> 512, 
   "Concurrent Kernels" -> True, "ECC Enabled" -> False, 
   "TCC Enabled" -> False, "Total Memory" -> 1610612736}}

Then:

CUDADot[Table[i, {i, 10}, {j, 10}], 
  Table[i, {i, 10}, {j, 10}]] // MatrixForm

CUDADot::allocf: A CUDALink memory allocation failed.

Also:

numberOfOptions = 32;
spotPrices = RandomReal[{25.0, 35.0}, numberOfOptions];
strikePrices = RandomReal[{20.0, 40.0}, numberOfOptions];
expiration = RandomReal[{0.1, 10.0}, numberOfOptions];
interest = 0.08;
volatility = RandomReal[{0.10, 0.50}, numberOfOptions];
dividend = RandomReal[{0.2, 0.06}, numberOfOptions];

CUDAFinancialDerivative[{"American", 
  "Call"}, {"StrikePrice" -> strikePrices, 
  "Expiration" -> expiration}, {"CurrentPrice" -> spotPrices, 
  "InterestRate" -> interest, "Volatility" -> volatility, 
  "Dividend" -> dividend}]

{0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., \ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.}

POSTED BY: Jonathan Kinlay

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

What does `InstallCUDA[]` return?

POSTED BY: Stefan Ragnarsson

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 5 years ago

Mathematica reports "Success" (CUDA v 11.1 installed). But the results are exactly as before.

POSTED BY: Jonathan Kinlay

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

Could you try this? Needs["CUDALink`"] GPUTools`Utilities`VerboseLogPrinter = 1; CUDAQ[] CUDADot[Table[i,{i,10},{j,10}],Table[i,{i,10},{j,10}]] and post the result?

POSTED BY: Stefan Ragnarsson

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 5 years ago

It returns: True and: CUDADot::allocf: A CUDALink memory allocation failed.

POSTED BY: Jonathan Kinlay

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

That command should have printed a bunch of debug information as well, I'm mostly interested in that. For example, on my Windows machine I get: Needs["CUDALink`"] GPUTools`Utilities`VerboseLogPrinter=1; CUDAQ[] CUDADot[Table[i,{i,10},{j,10}],Table[i,{i,10},{j,10}]] LOG: ==== Loading Library Files ==== LOG: Loading CUDA Library Files: C:\WINDOWS\System32\nvcuda.dll, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cudart64_110.dll, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cufft64_10.dll, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cufftw64_10.dll, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cublasLt64_11.dll, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cublas64_11.dll, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\curand64_10.dll LOG: NVIDIA driver library is located in C:\WINDOWS\System32\nvapi64.dll LOG: NVIDIA Driver Library is Valid LOG: CUDA Library is Valid LOG: ==== Loading Library Functions ==== LOG: ==== Initializing CUDA ==== True What do you see on your machine?

That command should have printed a bunch of debug information as well, I'm mostly interested in that. For example, on my Windows machine I get:

Needs["CUDALink`"]
GPUTools`Utilities`VerboseLogPrinter=1;
CUDAQ[]
CUDADot[Table[i,{i,10},{j,10}],Table[i,{i,10},{j,10}]]
LOG:   ==== Loading Library Files ==== 
LOG:  Loading CUDA Library Files:  C:\WINDOWS\System32\nvcuda.dll, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cudart64_110.dll, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cufft64_10.dll, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cufftw64_10.dll, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cublasLt64_11.dll, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cublas64_11.dll, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\curand64_10.dll
LOG:  NVIDIA driver library is located in  C:\WINDOWS\System32\nvapi64.dll
LOG:  NVIDIA Driver Library is   Valid
LOG:  CUDA Library is   Valid
LOG:   ==== Loading Library Functions ==== 
LOG:   ==== Initializing  CUDA  ==== 
True

What do you see on your machine?

POSTED BY: Stefan Ragnarsson

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 5 years ago

Nope - nothing. Only what I wrote before.

POSTED BY: Jonathan Kinlay

Shenghui Yang

Shenghui Yang, WOLFRAM

Posted 5 years ago

I tested on my dell precision 7750 with RTX3000 + Win 10 + WL12.2. Your code works for me.

POSTED BY: Shenghui Yang

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 5 years ago

CUDA works on some Mathematica 12.2 installations and not on others. On some it works with Mathematica 12.1, but not Mathematica 12.2. On other Win 10 machines it works on neither 12.1 or 12.2. And several of the CUDA functions no longer work as shown in the documentation on installations where CUDA is successfully installed. As I said previously, it's an inconsistent mess.

POSTED BY: Jonathan Kinlay

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

Could you please contact Wolfram Support (https://www.wolfram.com/support/contact/) with the details?

POSTED BY: Stefan Ragnarsson

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 5 years ago

I figured it out. CUDA works on Mathematica 12.2 with NVIDIA Toolkit ver 11.1, but not with ver 11.2 (which is what I had installed originally).

POSTED BY: Jonathan Kinlay

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 5 years ago

I spoke too soon! Needs["CUDALink`"] CUDADot[Table[i, {i, 10}, {j, 10}], Table[i, {i, 10}, {j, 10}]] // MatrixForm (837977408 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 10 10 10 10 )

I spoke too soon!

Needs["CUDALink`"]

CUDADot[Table[i, {i, 10}, {j, 10}], 
  Table[i, {i, 10}, {j, 10}]] // MatrixForm

(837977408  1    1  1    1  1    1  1    1  1
2           2 2   2 2   2 2   2 2   2
3           3 3   3 3   3 3   3 3   3
4           4 4   4 4   4 4   4 4   4
5           5 5   5 5   5 5   5 5   5
6           6 6   6 6   6 6   6 6   6
7           7 7   7 7   7 7   7 7   7
8           8 8   8 8   8 8   8 8   8
9           9 9   9 9   9 9   9 9   9
10         10   10    10 10  10   10    10 10  10

)

POSTED BY: Jonathan Kinlay

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

Could you send this to technical support? A notebook with the information returned by `InstallCUDA[]`, `SystemInformation[]`, `CUDADriverVersion[]` and `CUDAInformation[]` as well as the Needs["CUDALink`"] GPUTools`Utilities`VerboseLogPrinter=1; CUDAQ[] check (which needs to be evaluated right after a kernel restart via `Quit[]`) would be extremely useful in determining what's going on.

POSTED BY: Stefan Ragnarsson

Hamood Khan

Hamood Khan, King Fahd University of Petroleum and Minerals

Posted 5 years ago

Same problem here as above. CUDAResourcesInformation[] gives error, while CUDAQ[] returns true. CUDA functions do not execute. CUDA Version 11.2. Mathematica version 12.2. OS Version Windows 10 (Insider Build), GPU : Nvidia 2070 RTX. Can't get to work any of the functions like CUDAFold, CUDAFourier etc.

POSTED BY: Hamood Khan

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

`CUDAResources` and `CUDAResourcesInformation` are obsolete in Mathematica 12.2. What does `InstallCUDA[]` return? Also, please note there's a known issue with `CUDAFourier` in M12.2, but `CUDAFold` should work.

POSTED BY: Stefan Ragnarsson

Hamood Khan

Hamood Khan, King Fahd University of Petroleum and Minerals

Posted 5 years ago

Hello. Here are the outputs for commands you asked: In[2]:= CUDAQ[] Out[2]= True In[3]:= InstallCUDA[] Out[3]= Success["CUDALinkLoaded", Association[ "MessageTemplate" :> "CUDALink installation complete.", "CUDAVersion" -> 11.2, "DefaultDevice" -> "GeForce RTX 2070", "Toolkit" -> "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\ \\v11.2", "NVCC" -> "C:\\Program Files\\NVIDIA GPU Computing \ Toolkit\\CUDA\\v11.2\\bin\\nvcc.exe", "LibrariesLoaded" -> { "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.2\\bin\ \\cublas64_11.dll", "C:\\Program Files\\NVIDIA GPU Computing \ Toolkit\\CUDA\\v11.2\\bin\\cublasLt64_11.dll", "C:\\Program Files\\NVIDIA GPU Computing \ Toolkit\\CUDA\\v11.2\\bin\\cudart64_110.dll", "C:\\Program Files\\NVIDIA GPU Computing \ Toolkit\\CUDA\\v11.2\\bin\\cufft64_10.dll", "C:\\Program Files\\NVIDIA GPU Computing \ Toolkit\\CUDA\\v11.2\\bin\\cufftw64_10.dll", "C:\\Program Files\\NVIDIA GPU Computing \ Toolkit\\CUDA\\v11.2\\bin\\curand64_10.dll", "C:\\Windows\\System32\\nvcuda.dll"}]] Yes CUDAFourier[[] is now working. However CUDAFold was not working before but strangely enough it is now working! so is CUDAFoldList. I don't know what changed.

Hello. Here are the outputs for commands you asked:

In[2]:= CUDAQ[]

Out[2]= True

In[3]:= InstallCUDA[]

Out[3]= Success["CUDALinkLoaded", 
Association[
 "MessageTemplate" :> "CUDALink installation complete.", 
  "CUDAVersion" -> 11.2, "DefaultDevice" -> "GeForce RTX 2070", 
  "Toolkit" -> "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\
\\v11.2", 
  "NVCC" -> "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\nvcc.exe", 
  "LibrariesLoaded" -> {
   "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.2\\bin\
\\cublas64_11.dll", 
    "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\cublasLt64_11.dll", 
    "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\cudart64_110.dll", 
    "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\cufft64_10.dll", 
    "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\cufftw64_10.dll", 
    "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\curand64_10.dll", 
    "C:\\Windows\\System32\\nvcuda.dll"}]]

Yes CUDAFourier[[] is now working. However CUDAFold was not working before but strangely enough it is now working! so is CUDAFoldList. I don't know what changed.

POSTED BY: Hamood Khan

Frederick Carlson

Posted 5 years ago

I am very confused. It is asking me for a paclet, but this conversation says that 12.2 needs no paclet. Is there some sort of web page or blog or something that explains how to get CUDA working with 12.2??? In[8]:= CUDAQ[] Out[8]= True In[9]:= CUDAInformation[] Out[9]= {1 -> {"Name" -> "GeForce GTX 1080 Ti", "Clock Rate" -> 1582000, "Compute Capabilities" -> 6.1, "GPU Overlap" -> 1, ... In[10]:= CUDADriverVersion[] Out[10]= "460.89" In[11]:= vec = Range[1., 10]; CUDAFourier[vec] During evaluation of In[11]:= CUDAFourier::nopaclet: CUDAResources was not found. Make sure that you are connected to the internet and Mathematica is allowed access to the internet. Out[12]= CUDAFourier[{1., 2., 3., 4., 5., 6., 7., 8., 9., 10.}]

I am very confused. It is asking me for a paclet, but this conversation says that 12.2 needs no paclet. Is there some sort of web page or blog or something that explains how to get CUDA working with 12.2???

In[8]:= CUDAQ[]

Out[8]= True

In[9]:= CUDAInformation[]

Out[9]= {1 -> {"Name" -> "GeForce GTX 1080 Ti", 
   "Clock Rate" -> 1582000, "Compute Capabilities" -> 6.1, 
   "GPU Overlap" -> 1, ...

In[10]:= CUDADriverVersion[]

Out[10]= "460.89"

In[11]:= vec = Range[1., 10];
CUDAFourier[vec]

During evaluation of In[11]:= CUDAFourier::nopaclet: CUDAResources was not found. Make sure that you are connected to the internet and Mathematica is allowed access to the internet.

Out[12]= CUDAFourier[{1., 2., 3., 4., 5., 6., 7., 8., 9., 10.}]

POSTED BY: Frederick Carlson

Ernst H.K. Stelzer

Ernst H.K. Stelzer, Goethe-Universtät Frankfurt am Main

Posted 5 years ago

I would like to add that I used CUDALink on my PC (Windows 10, M1200) and on a server (2x K2 until recently, now 2x P4) prior to 12.2. Since I installed 12.2, CUDA did not work. The same issues are described several times in this thread. I also removed Mathematica completely as described on the Wolfram website and re-installed Mathematica, Visual Studio 2019, nVidia GPU, CUDA toolkit 11.2 from scratch. No problem compiling GPU code in the VS2019 environment but CUDA in 12.2 does not work. I rely heavily on CUDA. I am working myself through various 12.2 issues (SerialLink?, R?, Julia?, ...). Unluckily, I cannot test everything from my well-equipped and well-connected home office but I thought I could solve the CUDA issues. I will give this up, until Wolfram provides a solution. EDIT: Well so much about stopping. (Inner[Rule, #, ToExpression@#, Association] &@ Names["$CUDA"]) // Dataset

POSTED BY: Ernst H.K. Stelzer

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

What does <<CUDALink` CUDAQ[] return for you?

POSTED BY: Stefan Ragnarsson

Ernst H.K. Stelzer

Ernst H.K. Stelzer, Goethe-Universtät Frankfurt am Main

Posted 5 years ago

<< CUDALink`

CUDAQ[]
True

CUDAInformation[]
{1 -> {"Name" -> "Quadro M1200", "Clock Rate" -> 1148000, 
   "Compute Capabilities" -> 5., "GPU" ....

CUDADriverVersion[]
460.89

vec = Range[1., 10];
CUDAFourier[ vec]

CUDAFourier::internal: CUDALink experienced an internal error.
CUDAFourier[{1., 2., 3., 4., 5., 6., 7., 8., 9., 10.}]

POSTED BY: Ernst H.K. Stelzer

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

Thanks, this is useful! There might be a separate problem with `CUDAFourier`. Do other functions like `CUDADot` work? And how about `CUDAFunctionLoad`?

POSTED BY: Stefan Ragnarsson

Ernst H.K. Stelzer

Ernst H.K. Stelzer, Goethe-Universtät Frankfurt am Main

Posted 5 years ago

Amazing. CUDADot and now even CUDAFourier and CUDAMemoryLoad and CUDAMemoryGet work. But

cudaFun = 
 CUDAFunctionLoad[code, 
  "addTwo", {{_Integer, _, "Input"}, {_Integer, _, 
    "Output"}, _Integer}, 256]

CUDAFunctionLoad::instl: The compiler installation directive "CompilerInstallation" -> $Failed does not indicate a usable installation of NVIDIA CUDA Compiler (executable: CCompilerDriver`CCompilerDriverBase`BaseDriver[ResolveCompilerName][Automatic]).

CUDACCompilers[]
{{"Name" -> "Visual Studio", 
  "Compiler" -> 
   CCompilerDriver`VisualStudioCompiler`VisualStudioCompiler, 
  "CompilerInstallation" -> 
   "C:\\Program Files (x86)\\Microsoft Visual \
Studio\\2019\\Community", 
  "CompilerName" -> Automatic}, {"Name" -> "Visual Studio", 
  "Compiler" -> 
   CCompilerDriver`VisualStudioCompiler`VisualStudioCompiler, 
  "CompilerInstallation" -> 
   "C:\\Program Files (x86)\\Microsoft Visual \
Studio\\2017\\BuildTools", "CompilerName" -> Automatic}}

POSTED BY: Ernst H.K. Stelzer

Jean Jean

Posted 5 years ago

Hi, Windows 10 Pro 20H2 build 19042.685, CUDA 11.2, driver 460.89 OpenSUSE Tumbleweed, kernel 5.9, CUDA 11.2, driver 455

POSTED BY: Jean Jean

Updating Name

Posted 5 years ago

There seems to be a development on my problem. Apparently the GPU calculations do work after all, the problem is one of initialization. The first time the GPUs are called for any calculation the kernel takes around 25 minutes to end the calculation and load the GPUs (this seems to be independent of what calculation involving the GPUs is being done). Once the GPU is loaded once by waiting these 25 minutes, all remaining calls to GPU have an instant response and everything works normally. This intial call happens for every new Kernel and everytime the kernel is re-initiated. Therefore it seems to be an issue with loading the GPUs the first time. Not sure however how to resolve this. Clearly this is a big problem, since kernel resets happen all the time and one cannot be dependant on waiting 25 minutes per reset.

POSTED BY: Updating Name

Gianni Tallarita

Gianni Tallarita, Universidad Adolfo Ibanez

Posted 5 years ago

Hi Stefan, strangely the 11.2+12.2+RTX 3090 combination seems to have worked before (see above user Jean-Michel Collard). It apparently doesn't work for some of us? Something non trivial is happening here. In any case, thank you very much for your help. I hope I can get this resolved asap.

POSTED BY: Gianni Tallarita

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

Thank you for the feedback! We are investigating and will push out fixes as soon as possible!

POSTED BY: Stefan Ragnarsson

Gianni Tallarita

Gianni Tallarita, Universidad Adolfo Ibanez

Posted 5 years ago

Dear Stefan, Is there an update to this issue? Thank you, Gianni

POSTED BY: Gianni Tallarita

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

Yes, we pushed out a paclet update yesterday afternoon! If you restart the kernel and load CUDALink again it should automatically update. The paclet version is CUDALink-12.2.1. It includes: Updated Setup tutorial New function `InstallCUDA[]` that checks if CUDALink is supported on the current machine. Functions `CUDAResourcesInstall` and `CUDAResourcesUninstall` are now marked Obsolete and CUDALink will no longer attempt to download the CUDAResources paclet automatically. Code for detecting CUDA Toolkit installations updated.

POSTED BY: Stefan Ragnarsson

Gianni Tallarita

Gianni Tallarita, Universidad Adolfo Ibanez

Posted 5 years ago

This is great news. Thank you. I'm guessing this does not fix the MxNET issue and Neural Network functionality is still an issue with these new GPUs?

POSTED BY: Gianni Tallarita

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

POSTED BY: Stefan Ragnarsson

Gianni Tallarita

Gianni Tallarita, Universidad Adolfo Ibanez

Posted 5 years ago

good to know. thank you.

POSTED BY: Gianni Tallarita

Gianni Tallarita

Gianni Tallarita, Universidad Adolfo Ibanez

Posted 5 years ago

Attachments: Captura2.PNG

POSTED BY: Gianni Tallarita

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

Hi Gianni, Please note that CUDALink and NetTrain use a completely separate implementation, I believe NetTrain actually does distribute the necessary libraries (in a paclet called MXNetResources). The good news here seems to be that CUDALink is working, the bad news that NetTrain is having some problems... I've forwarded this to our developers to investigate.

POSTED BY: Stefan Ragnarsson

Gianni Tallarita

Gianni Tallarita, Universidad Adolfo Ibanez

Posted 5 years ago

POSTED BY: Gianni Tallarita

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

We think you may be right, we're looking into a fix!

POSTED BY: Stefan Ragnarsson

Gregory K

Posted 5 years ago

Stefan, is there any light at the end of the tunnel to properly utilize the 3090? Mine works with NetTrain, but... 1) CUDAInformation[] returns Core Count -> Indeterminate (same as one of the posts above), 2) RT cores are determined correctly (82), but Tensor cores aren't displayed at all? 3) max batch size seems to be 10k in NetTrain, which is a travesty, as 3090 can benefit from greater batch sizes, as it has over 10k cores and huge PCIe4 bus. 4) Running the example given in NetTrain help file under TargetDevice option, but adding BatchSize->10000, WorkingPrecision->"Mixed" finishes the training in 13-14 seconds. Given that my 3960X CPU finishes in 20-22s with same batch size and Automatic WorkingPrecision, 13 seconds just seems woefully poor of performance for this monstrous GPU. Thanks for your feedback and for developing such a great computing platform! Attachments: CUDA Screenshot ...png

POSTED BY: Gregory K

ZAQU zaqu

Posted 2 years ago

No. TCC is Tesla Compute Cluster. https://docs.nvidia.com/gameworks/content/developertools/desktop/tesla_compute_cluster.htm It uses TCC driver and not WDDM.

POSTED BY: ZAQU zaqu

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

CUDAResourcesInformation[], as well as the CUDAResources paclet itself, is no longer needed in Mathematica 12.2.0. We apologize for the documentation not having been updated to reflect this change. What does CUDAQ[] return? Are you on Windows or Linux?

POSTED BY: Stefan Ragnarsson

Gianni Tallarita

Gianni Tallarita, Universidad Adolfo Ibanez

Posted 5 years ago

I'm also having the same issue, installed toolkit 11.2 and still CUDAResourcesInformation[] doesn't recognise anything. Please provide detailed information.

POSTED BY: Gianni Tallarita

Jean Jean

Posted 5 years ago

POSTED BY: Jean Jean

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

POSTED BY: Stefan Ragnarsson

Jean-Michel Charles Collard-Richard

Posted 5 years ago

Hello, I just downloaded the package from NVDIA resources (11.2) toolkit , clicked on the installer and all ran well. I am not a magician but it worked. Best.

POSTED BY: Jean-Michel Charles Collard-Richard

Jean Jean

Posted 5 years ago

Hello, I cannot manage to have CUDA working on 12.2 Either on Linux or Windows I successfully installed CUDA 11.2.0 on both OSes (nvcc available, compiling CUDA examples OK, environment variables OK, etc.) On Windows: not working, keeps saying that on CUDAResources are not available, trying to install CUDAResources-Win64-12.1.0 manually does not work... On Linux: it seems to work but downgraded to 9.0.0.0 ?! Mathematica keeps downloading over and over CUDAResources-Lin64-9.0.0.0 paclet ! (files in this kit are dated 2012...!) Trying to install CUDAResources-Lin64-12.1.0 manually does not work... So, where is the problem ? Should I need a "CUDAResources-Win64/Lin64-12.2.0. paclet" ? But where it is ? (Only 12.1.0 is at available for download at https://www.wolfram.com/CUDA/CUDAResources.html) What is the exact procedure to have CUDA working in 12.2 Linux & Windows ? Jean-Michel, how did you manage to make it working ? Detailed procedure ? Thank you for your help, Jean Attachments: 12_2_CUDABroken.png

POSTED BY: Jean Jean

Michel Mesedahl

Posted 5 years ago

I am getting the exact same issue.

POSTED BY: Michel Mesedahl

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

Unfortunately not. The dispute between Apple and Nvidia means there haven't been updated drivers or CUDA Toolkit on mac for many years so it no longer made sense for us to try to support it.

POSTED BY: Stefan Ragnarsson

Joo-Haeng Lee

Posted 5 years ago

"As of Version 12.2, CUDA on macOS is no longer supported." Seriously?! Is there any workaround to run NetTrain[] on macOS + NVIDIA GPU + MMA 12.2? https://reference.wolfram.com/language/workflow/UseCUDAOnAnExternalGPUOnMac.html

POSTED BY: Joo-Haeng Lee

Michal Kvasnicka

Posted 5 years ago

No NVIDIA CUDA support on Macs from now, including external NVIDIA GPUs. Happy computing ...

POSTED BY: Michal Kvasnicka

Jean-Michel Charles Collard-Richard

Posted 5 years ago

CUDA worls perfectly with Mathematica 2.2 AND CUDA Toolkit 11.2 . this is sure I just installed it and tested. I repeat Toolkit 11.2 . You can find it on nvidia web. Excellent :) Greetings to all members.

POSTED BY: Jean-Michel Charles Collard-Richard

Jean-Michel Charles Collard-Richard

Posted 5 years ago

The latest version of NVIDIA Toolkit is 11.2 I try to install but there a incompability with another software on my comp'. I try...I will let you know. Jean-Michel

POSTED BY: Jean-Michel Charles Collard-Richard

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

Starting in V12.2 the CUDA Toolkit binaries are not supplied by Wolfram Research, but should be installed separately by the user. Do you have the CUDA Toolkit from NVIDIA installed? It can be found here: https://developer.nvidia.com/cuda-toolkit

POSTED BY: Stefan Ragnarsson

Michal Kvasnicka

Posted 5 years ago

Yes, I have installed CUDA toolkit 10.2. Where can I find actual compatibility matrix?

POSTED BY: Michal Kvasnicka

Jean-Michel Charles Collard-Richard

Posted 5 years ago

POSTED BY: Jean-Michel Charles Collard-Richard

Michal Kvasnicka

Posted 5 years ago

Which versions of NVIDIA CUDA Toolkit is compatible with Mma 12.2???

POSTED BY: Michal Kvasnicka

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

CUDA Toolkit 11.x should work (but 10.x will not).

POSTED BY: Stefan Ragnarsson

Charalampos Markakis

Charalampos Markakis, QMUL

Posted 5 years ago

POSTED BY: Charalampos Markakis

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

That is very strange... `CUDADot` does not require the CUDAResources paclet. Do the documentation examples for `CUDAFunctionLoad` work?

POSTED BY: Stefan Ragnarsson

Charalampos Markakis

Charalampos Markakis, QMUL

Posted 5 years ago

I tried the command `cudaFun = CUDAFunctionLoad[code, "addTwo", {{_Integer, _, "Input"}, {_Integer, _, "Output"}, _Integer}, 256]` from the documentation, but it returned the error message `"CUDAFunctionLoad::instl: The compiler installation directive "CompilerInstallation" -> $Failed does not indicate a usable installation of NVIDIA CUDA Compiler (executable: CCompilerDriver`CCompilerDriverBase`BaseDriver[ResolveCompilerName][Automatic])."` I have CUDA 11.2 installed and the latest Nvidia drivers, CUDA works with Visual Studio as well as Mathematica 12.1. and 12.0. CUDA seems to be broken only in Mathematica 12.2.

POSTED BY: Charalampos Markakis

Anran Lee

Posted 5 years ago

But in the official site, the newest Mathematica version of 12.1.0 is only compatible with CUDA Toolkit 10.2.89 but not cuda 11.1, the site is below: https://www.wolfram.com/CUDA/CUDAResources.html and sometiems cannot connect to wolfram research, maybe because censor or web block in China.

POSTED BY: Anran Lee

Stefan Ragnarsson

Stefan Ragnarsson, Wolfram Research

Posted 5 years ago

Are you referring to the Chinese version of Mathematica? If so, the 12.2.0 version of that will be released very soon!

POSTED BY: Stefan Ragnarsson

Anran Lee

Posted 5 years ago

Yes, I use Chinese version. Thanks for your passional help. I can‘t wait to see Mathematica 12.2.

POSTED BY: Anran Lee

Michal Kvasnicka

Posted 5 years ago

Please keep us informed here for any solutions...

POSTED BY: Michal Kvasnicka

Michal Kvasnicka

Posted 5 years ago

Yes, I can confirm that (Win10 + Quadro P1000 + Mathematica 12.2). CUDAlink and all subsequent CUDA functionalities looks like totally broken... :(

POSTED BY: Michal Kvasnicka

Jean-Michel Charles Collard-Richard

Posted 5 years ago

Hello Michal, I will send a mail to Support. Regards, Jean-Michel

POSTED BY: Jean-Michel Charles Collard-Richard

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback