Message Boards Message Boards

GROUPS:

CUDA not working on Mathematica 12.2

Posted 1 year ago
12269 Views
|
88 Replies
|
49 Total Likes
|

Hello all :) CUDA is not working on 12.2. Look at this: What I am supposed to do? I also tried by downloading the CUDA packlets: change nothing. I have an NVIDIA RTX 3090 and the drivers are ok. Thank you foe helping me. Regards to all,

Jean-Michel

Attachments:
88 Replies

Yes, I can confirm that (Win10 + Quadro P1000 + Mathematica 12.2). CUDAlink and all subsequent CUDA functionalities looks like totally broken... :(

Hello Michal, I will send a mail to Support. Regards, Jean-Michel

Please keep us informed here for any solutions...

Starting in V12.2 the CUDA Toolkit binaries are not supplied by Wolfram Research, but should be installed separately by the user. Do you have the CUDA Toolkit from NVIDIA installed?

It can be found here: https://developer.nvidia.com/cuda-toolkit

Yes, I have installed CUDA toolkit 10.2.

Where can I find actual compatibility matrix?

What a pity I have THE RTX 3090 with more than 10000 cuda cores. I will install cuda toolkit 11.* I will let you know here. Regards

Which versions of NVIDIA CUDA Toolkit is compatible with Mma 12.2???

CUDA Toolkit 11.x should work (but 10.x will not).

We installed CUDA 11.2 successfully (Windows 2019 Server + Quadro GV100 and Windows 10 + Titan V) but, in both machines, CUDA functionality in Mathematica 12.2 is broken.

CUDAQ[] returns True, and CUDADriverVersion[] returns 465.21 but CUDADot[Table[i, {i, 10}, {j, 10}], Table[i, {i, 10}, {j, 10}]] returns CUDADot::nopaclet: CUDAResources was not found. Make sure that you are connected to the internet and Mathematica is allowed access to the internet.

We have invested heavily on Mathematica CUDA software and supporting hardware, so this functionality is vital for us. I hope this can be fixed asap.

That is very strange... CUDADot does not require the CUDAResources paclet. Do the documentation examples for CUDAFunctionLoad work?

I tried the command

cudaFun = CUDAFunctionLoad[code, "addTwo", {{_Integer, _, "Input"}, {_Integer, _, "Output"}, _Integer}, 256]

from the documentation, but it returned the error message "CUDAFunctionLoad::instl: The compiler installation directive "CompilerInstallation" -> $Failed does not indicate a usable installation of NVIDIA CUDA Compiler (executable: CCompilerDriverCCompilerDriverBaseBaseDriver[ResolveCompilerName][Automatic])."

I have CUDA 11.2 installed and the latest Nvidia drivers, CUDA works with Visual Studio as well as Mathematica 12.1. and 12.0. CUDA seems to be broken only in Mathematica 12.2.

Posted 10 months ago

But in the official site, the newest Mathematica version of 12.1.0 is only compatible with CUDA Toolkit 10.2.89 but not cuda 11.1, the site is below: https://www.wolfram.com/CUDA/CUDAResources.html and sometiems cannot connect to wolfram research, maybe because censor or web block in China.

Are you referring to the Chinese version of Mathematica? If so, the 12.2.0 version of that will be released very soon!

Posted 9 months ago

Yes, I use Chinese version. Thanks for your passional help. I can‘t wait to see Mathematica 12.2.

The latest version of NVIDIA Toolkit is 11.2 I try to install but there a incompability with another software on my comp'. I try...I will let you know. Jean-Michel

CUDA worls perfectly with Mathematica 2.2 AND CUDA Toolkit 11.2 . this is sure I just installed it and tested. I repeat Toolkit 11.2 . You can find it on nvidia web. Excellent :) Greetings to all members.

Posted 1 year ago

"As of Version 12.2, CUDA on macOS is no longer supported."

Seriously?! Is there any workaround to run NetTrain[] on macOS + NVIDIA GPU + MMA 12.2?

https://reference.wolfram.com/language/workflow/UseCUDAOnAnExternalGPUOnMac.html

No NVIDIA CUDA support on Macs from now, including external NVIDIA GPUs. Happy computing ...

Unfortunately not. The dispute between Apple and Nvidia means there haven't been updated drivers or CUDA Toolkit on mac for many years so it no longer made sense for us to try to support it.

Posted 1 year ago

Hello, I cannot manage to have CUDA working on 12.2 Either on Linux or Windows I successfully installed CUDA 11.2.0 on both OSes (nvcc available, compiling CUDA examples OK, environment variables OK, etc.) On Windows: not working, keeps saying that on CUDAResources are not available, trying to install CUDAResources-Win64-12.1.0 manually does not work... On Linux: it seems to work but downgraded to 9.0.0.0 ?! Mathematica keeps downloading over and over CUDAResources-Lin64-9.0.0.0 paclet ! (files in this kit are dated 2012...!) Trying to install CUDAResources-Lin64-12.1.0 manually does not work... So, where is the problem ? Should I need a "CUDAResources-Win64/Lin64-12.2.0. paclet" ? But where it is ? (Only 12.1.0 is at available for download at https://www.wolfram.com/CUDA/CUDAResources.html) What is the exact procedure to have CUDA working in 12.2 Linux & Windows ? Jean-Michel, how did you manage to make it working ? Detailed procedure ? Thank you for your help, Jean

Attachment

Attachments:

I am getting the exact same issue.

Hello, I just downloaded the package from NVDIA resources (11.2) toolkit , clicked on the installer and all ran well. I am not a magician but it worked. Best.

Posted 1 year ago

Hi Jean-Michel, Well, strange... I guess you are with Windows. So, your CUDAResourcesInformation[] must return something valid, pointing to the correct CUDA lib version ? (11.2). Could you share ? Thanks, J.

The CUDAResources paclet is no longer needed as of Mathematica 12.2, and will no longer distribute CUDA libraries from now on. As such, CUDAResourcesInformation[] no longer returns any relevant information. CUDALink should now use the CUDA Toolkit libraries installed on your machine. If it is not doing so, more information would be appreciated : OS version, CUDA Toolkit version, Nvidia driver version, for example.

I'm also having the same issue, installed toolkit 11.2 and still CUDAResourcesInformation[] doesn't recognise anything. Please provide detailed information.

CUDAResourcesInformation[], as well as the CUDAResources paclet itself, is no longer needed in Mathematica 12.2.0. We apologize for the documentation not having been updated to reflect this change.

What does CUDAQ[] return? Are you on Windows or Linux?

Hello, I am on Windows 10 Pro edition, Mathematica 12.2, RTX 3090 GPU and I have installed NVIDIA's CUDA Toolkit 11.2.

CUDAQ[] returns True and CUDAInformation correctly recognises my GPU system. However whenever I run any GPU call the calculation will simply run forever up to the point where mathematica tells me it's not responding any more. I attach a snaposhot to demonstrate. I should mention that the same system on mathematica 12.1 run perfectly on my previous RTX 2080Ti GPU.

Attachment

Attachments:

Hi Gianni,

Please note that CUDALink and NetTrain use a completely separate implementation, I believe NetTrain actually does distribute the necessary libraries (in a paclet called MXNetResources). The good news here seems to be that CUDALink is working, the bad news that NetTrain is having some problems... I've forwarded this to our developers to investigate.

Dear Stefan,

Indeed it appears Mathematica 12.2 uses an old version of MxNET which doesn't seem appropriate for RTX3090 GPU series. I think the MxNET paclet has to be updated.

Thank you.

Gianni

We think you may be right, we're looking into a fix!

Posted 8 months ago

Stefan,

is there any light at the end of the tunnel to properly utilize the 3090? Mine works with NetTrain, but...
1) CUDAInformation[] returns Core Count -> Indeterminate (same as one of the posts above),
2) RT cores are determined correctly (82), but Tensor cores aren't displayed at all?
3) max batch size seems to be 10k in NetTrain, which is a travesty, as 3090 can benefit from greater batch sizes, as it has over 10k cores and huge PCIe4 bus.
4) Running the example given in NetTrain help file under TargetDevice option, but adding BatchSize->10000, WorkingPrecision->"Mixed" finishes the training in 13-14 seconds. Given that my 3960X CPU finishes in 20-22s with same batch size and Automatic WorkingPrecision, 13 seconds just seems woefully poor of performance for this monstrous GPU.

Thanks for your feedback and for developing such a great computing platform!

Attachment

Attachments:

Hi Stefan,

strangely the 11.2+12.2+RTX 3090 combination seems to have worked before (see above user Jean-Michel Collard). It apparently doesn't work for some of us? Something non trivial is happening here. In any case, thank you very much for your help. I hope I can get this resolved asap.

Thank you for the feedback! We are investigating and will push out fixes as soon as possible!

Dear Stefan,

Is there an update to this issue?

Thank you, Gianni

Yes, we pushed out a paclet update yesterday afternoon! If you restart the kernel and load CUDALink again it should automatically update. The paclet version is CUDALink-12.2.1.

It includes:

  • Updated Setup tutorial
  • New function InstallCUDA[] that checks if CUDALink is supported on the current machine.
  • Functions CUDAResourcesInstall and CUDAResourcesUninstall are now marked Obsolete and CUDALink will no longer attempt to download the CUDAResources paclet automatically.
  • Code for detecting CUDA Toolkit installations updated.

This is great news. Thank you. I'm guessing this does not fix the MxNET issue and Neural Network functionality is still an issue with these new GPUs?

Correct, that is still in progress. The NeuralNetworks team is working on it as their highest priority.

good to know. thank you.

Posted 1 year ago

There seems to be a development on my problem. Apparently the GPU calculations do work after all, the problem is one of initialization. The first time the GPUs are called for any calculation the kernel takes around 25 minutes to end the calculation and load the GPUs (this seems to be independent of what calculation involving the GPUs is being done). Once the GPU is loaded once by waiting these 25 minutes, all remaining calls to GPU have an instant response and everything works normally. This intial call happens for every new Kernel and everytime the kernel is re-initiated.

Therefore it seems to be an issue with loading the GPUs the first time. Not sure however how to resolve this. Clearly this is a big problem, since kernel resets happen all the time and one cannot be dependant on waiting 25 minutes per reset.

Posted 1 year ago

Hi,

  • Windows 10 Pro 20H2 build 19042.685, CUDA 11.2, driver 460.89
  • OpenSUSE Tumbleweed, kernel 5.9, CUDA 11.2, driver 455

I would like to add that I used CUDALink on my PC (Windows 10, M1200) and on a server (2x K2 until recently, now 2x P4) prior to 12.2. Since I installed 12.2, CUDA did not work. The same issues are described several times in this thread.

I also removed Mathematica completely as described on the Wolfram website and re-installed Mathematica, Visual Studio 2019, nVidia GPU, CUDA toolkit 11.2 from scratch. No problem compiling GPU code in the VS2019 environment but CUDA in 12.2 does not work.

I rely heavily on CUDA.

I am working myself through various 12.2 issues (SerialLink?, R?, Julia?, ...). Unluckily, I cannot test everything from my well-equipped and well-connected home office but I thought I could solve the CUDA issues. I will give this up, until Wolfram provides a solution.

EDIT: Well so much about stopping.

(Inner[Rule, #, ToExpression@#, Association] &@
   Names["$*CUDA*"]) // Dataset

Dataset First six Associations

What does

<<CUDALink`
CUDAQ[]

return for you?

<< CUDALink`

CUDAQ[]
True

CUDAInformation[]
{1 -> {"Name" -> "Quadro M1200", "Clock Rate" -> 1148000, 
   "Compute Capabilities" -> 5., "GPU" ....

CUDADriverVersion[]
460.89

vec = Range[1., 10];
CUDAFourier[ vec]

CUDAFourier::internal: CUDALink experienced an internal error.
CUDAFourier[{1., 2., 3., 4., 5., 6., 7., 8., 9., 10.}]

Thanks, this is useful! There might be a separate problem with CUDAFourier. Do other functions like CUDADot work? And how about CUDAFunctionLoad?

Amazing. CUDADot and now even CUDAFourier and CUDAMemoryLoad and CUDAMemoryGet work. But

cudaFun = 
 CUDAFunctionLoad[code, 
  "addTwo", {{_Integer, _, "Input"}, {_Integer, _, 
    "Output"}, _Integer}, 256]

CUDAFunctionLoad::instl: The compiler installation directive "CompilerInstallation" -> $Failed does not indicate a usable installation of NVIDIA CUDA Compiler (executable: CCompilerDriver`CCompilerDriverBase`BaseDriver[ResolveCompilerName][Automatic]).

CUDACCompilers[]
{{"Name" -> "Visual Studio", 
  "Compiler" -> 
   CCompilerDriver`VisualStudioCompiler`VisualStudioCompiler, 
  "CompilerInstallation" -> 
   "C:\\Program Files (x86)\\Microsoft Visual \
Studio\\2019\\Community", 
  "CompilerName" -> Automatic}, {"Name" -> "Visual Studio", 
  "Compiler" -> 
   CCompilerDriver`VisualStudioCompiler`VisualStudioCompiler, 
  "CompilerInstallation" -> 
   "C:\\Program Files (x86)\\Microsoft Visual \
Studio\\2017\\BuildTools", "CompilerName" -> Automatic}}

I am very confused. It is asking me for a paclet, but this conversation says that 12.2 needs no paclet. Is there some sort of web page or blog or something that explains how to get CUDA working with 12.2???

In[8]:= CUDAQ[]

Out[8]= True

In[9]:= CUDAInformation[]

Out[9]= {1 -> {"Name" -> "GeForce GTX 1080 Ti", 
   "Clock Rate" -> 1582000, "Compute Capabilities" -> 6.1, 
   "GPU Overlap" -> 1, ...

In[10]:= CUDADriverVersion[]

Out[10]= "460.89"

In[11]:= vec = Range[1., 10];
CUDAFourier[vec]

During evaluation of In[11]:= CUDAFourier::nopaclet: CUDAResources was not found. Make sure that you are connected to the internet and Mathematica is allowed access to the internet.

Out[12]= CUDAFourier[{1., 2., 3., 4., 5., 6., 7., 8., 9., 10.}]

Same problem here as above. CUDAResourcesInformation[] gives error, while CUDAQ[] returns true. CUDA functions do not execute. CUDA Version 11.2. Mathematica version 12.2. OS Version Windows 10 (Insider Build), GPU : Nvidia 2070 RTX. Can't get to work any of the functions like CUDAFold, CUDAFourier etc.

CUDAResources and CUDAResourcesInformation are obsolete in Mathematica 12.2. What does InstallCUDA[] return? Also, please note there's a known issue with CUDAFourier in M12.2, but CUDAFold should work.

Hello. Here are the outputs for commands you asked:

In[2]:= CUDAQ[]

Out[2]= True

In[3]:= InstallCUDA[]

Out[3]= Success["CUDALinkLoaded", 
Association[
 "MessageTemplate" :> "CUDALink installation complete.", 
  "CUDAVersion" -> 11.2, "DefaultDevice" -> "GeForce RTX 2070", 
  "Toolkit" -> "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\
\\v11.2", 
  "NVCC" -> "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\nvcc.exe", 
  "LibrariesLoaded" -> {
   "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.2\\bin\
\\cublas64_11.dll", 
    "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\cublasLt64_11.dll", 
    "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\cudart64_110.dll", 
    "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\cufft64_10.dll", 
    "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\cufftw64_10.dll", 
    "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\curand64_10.dll", 
    "C:\\Windows\\System32\\nvcuda.dll"}]]

Yes CUDAFourier[[] is now working. However CUDAFold was not working before but strangely enough it is now working! so is CUDAFoldList. I don't know what changed.

Where I am working we have

1) a Win 10 pc with GEForce GTX 1060 graphics card that works fine with CUDA on Mathematica 12.2

2) a Win 10 PC with two old GTX 660 cards that will only run CUDA on Mathematica 12.1 and not with Mathematica 12.2 (regardless of whether NVIDIA toolkit v10 or v11 is installed)

3) Two iMac running Win 10 with GTX 675 cards, neither of which will run CUDA with either Mathematica 12.1 or Mathematica 12.2 (earlier versions of Mathematica 11 work fine with CUDA)

Besides these kinds of inconsistencies various of the CUDA functions don't work as advertised, for example the CUDAFinancialDerivative with AsianArithmetic options, one of the examples in the documentation, just returns infinite values.

It's an unreliable mess.

What is the output or error messages you're seeing in your setups that aren't working?

On the Win 10 PC with two GTX 660 cards and Mathematica 12.2:

Needs["CUDALink`"]
CUDAInformation[]
{1 -> {"Name" -> "GeForce GTX 660", "Clock Rate" -> 888500, 
   "Compute Capabilities" -> 3., "GPU Overlap" -> 1, 
   "Maximum Block Dimensions" -> {1024, 1024, 64}, 
   "Maximum Grid Dimensions" -> {2147483647, 65535, 65535}, 
   "Maximum Threads Per Block" -> 1024, 
   "Maximum Shared Memory Per Block" -> 49152, 
   "Total Constant Memory" -> 65536, "Warp Size" -> 32, 
   "Maximum Pitch" -> 2147483647, 
   "Maximum Registers Per Block" -> 65536, "Texture Alignment" -> 512,
    "Multiprocessor Count" -> 6, "Core Count" -> 1152, 
   "Execution Timeout" -> 1, "Integrated" -> False, 
   "Can Map Host Memory" -> True, "Compute Mode" -> "Default", 
   "Texture1D Width" -> 65536, "Texture2D Width" -> 65536, 
   "Texture2D Height" -> 65536, "Texture3D Width" -> 4096, 
   "Texture3D Height" -> 4096, "Texture3D Depth" -> 4096, 
   "Texture2D Array Width" -> 16384, 
   "Texture2D Array Height" -> 16384, 
   "Texture2D Array Slices" -> 2048, "Surface Alignment" -> 512, 
   "Concurrent Kernels" -> True, "ECC Enabled" -> False, 
   "TCC Enabled" -> False, "Total Memory" -> 1610612736}, 
 2 -> {"Name" -> "GeForce GTX 660", "Clock Rate" -> 888500, 
   "Compute Capabilities" -> 3., "GPU Overlap" -> 1, 
   "Maximum Block Dimensions" -> {1024, 1024, 64}, 
   "Maximum Grid Dimensions" -> {2147483647, 65535, 65535}, 
   "Maximum Threads Per Block" -> 1024, 
   "Maximum Shared Memory Per Block" -> 49152, 
   "Total Constant Memory" -> 65536, "Warp Size" -> 32, 
   "Maximum Pitch" -> 2147483647, 
   "Maximum Registers Per Block" -> 65536, "Texture Alignment" -> 512,
    "Multiprocessor Count" -> 6, "Core Count" -> 1152, 
   "Execution Timeout" -> 1, "Integrated" -> False, 
   "Can Map Host Memory" -> True, "Compute Mode" -> "Default", 
   "Texture1D Width" -> 65536, "Texture2D Width" -> 65536, 
   "Texture2D Height" -> 65536, "Texture3D Width" -> 4096, 
   "Texture3D Height" -> 4096, "Texture3D Depth" -> 4096, 
   "Texture2D Array Width" -> 16384, 
   "Texture2D Array Height" -> 16384, 
   "Texture2D Array Slices" -> 2048, "Surface Alignment" -> 512, 
   "Concurrent Kernels" -> True, "ECC Enabled" -> False, 
   "TCC Enabled" -> False, "Total Memory" -> 1610612736}}

Then:

CUDADot[Table[i, {i, 10}, {j, 10}], 
  Table[i, {i, 10}, {j, 10}]] // MatrixForm

CUDADot::allocf: A CUDALink memory allocation failed.

Also:

numberOfOptions = 32;
spotPrices = RandomReal[{25.0, 35.0}, numberOfOptions];
strikePrices = RandomReal[{20.0, 40.0}, numberOfOptions];
expiration = RandomReal[{0.1, 10.0}, numberOfOptions];
interest = 0.08;
volatility = RandomReal[{0.10, 0.50}, numberOfOptions];
dividend = RandomReal[{0.2, 0.06}, numberOfOptions];

CUDAFinancialDerivative[{"American", 
  "Call"}, {"StrikePrice" -> strikePrices, 
  "Expiration" -> expiration}, {"CurrentPrice" -> spotPrices, 
  "InterestRate" -> interest, "Volatility" -> volatility, 
  "Dividend" -> dividend}]

{0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., \ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.}

What does InstallCUDA[] return?

Mathematica reports "Success" (CUDA v 11.1 installed). But the results are exactly as before.

Could you try this?

Needs["CUDALink`"]
GPUTools`Utilities`VerboseLogPrinter = 1;
CUDAQ[]
CUDADot[Table[i,{i,10},{j,10}],Table[i,{i,10},{j,10}]]

and post the result?

It returns:

True

and:

CUDADot::allocf: A CUDALink memory allocation failed.

That command should have printed a bunch of debug information as well, I'm mostly interested in that. For example, on my Windows machine I get:

Needs["CUDALink`"]
GPUTools`Utilities`VerboseLogPrinter=1;
CUDAQ[]
CUDADot[Table[i,{i,10},{j,10}],Table[i,{i,10},{j,10}]]
LOG:   ==== Loading Library Files ==== 
LOG:  Loading CUDA Library Files:  C:\WINDOWS\System32\nvcuda.dll, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cudart64_110.dll, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cufft64_10.dll, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cufftw64_10.dll, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cublasLt64_11.dll, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cublas64_11.dll, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\curand64_10.dll
LOG:  NVIDIA driver library is located in  C:\WINDOWS\System32\nvapi64.dll
LOG:  NVIDIA Driver Library is   Valid
LOG:  CUDA Library is   Valid
LOG:   ==== Loading Library Functions ==== 
LOG:   ==== Initializing  CUDA  ==== 
True

What do you see on your machine?

Nope - nothing. Only what I wrote before.

I tested on my dell precision 7750 with RTX3000 + Win 10 + WL12.2. Your code works for me.

dell

CUDA works on some Mathematica 12.2 installations and not on others. On some it works with Mathematica 12.1, but not Mathematica 12.2. On other Win 10 machines it works on neither 12.1 or 12.2. And several of the CUDA functions no longer work as shown in the documentation on installations where CUDA is successfully installed.

As I said previously, it's an inconsistent mess.

Could you please contact Wolfram Support (https://www.wolfram.com/support/contact/) with the details?

I figured it out. CUDA works on Mathematica 12.2 with NVIDIA Toolkit ver 11.1, but not with ver 11.2 (which is what I had installed originally).

I spoke too soon!

Needs["CUDALink`"]

CUDADot[Table[i, {i, 10}, {j, 10}], 
  Table[i, {i, 10}, {j, 10}]] // MatrixForm

(837977408  1    1  1    1  1    1  1    1  1
2           2 2   2 2   2 2   2 2   2
3           3 3   3 3   3 3   3 3   3
4           4 4   4 4   4 4   4 4   4
5           5 5   5 5   5 5   5 5   5
6           6 6   6 6   6 6   6 6   6
7           7 7   7 7   7 7   7 7   7
8           8 8   8 8   8 8   8 8   8
9           9 9   9 9   9 9   9 9   9
10         10   10    10 10  10   10    10 10  10

)

Could you send this to technical support? A notebook with the information returned by InstallCUDA[], SystemInformation[], CUDADriverVersion[] and CUDAInformation[] as well as the Needs["CUDALink`"] GPUTools`Utilities`VerboseLogPrinter=1; CUDAQ[] check (which needs to be evaluated right after a kernel restart via Quit[]) would be extremely useful in determining what's going on.

Update : As of today, neural network operations (such as NetTrain) with TargetDevice->"GPU" should now work with Ampere-generation cards from Nvidia, e.g. RTX 3070, 3080 or 3090.

As of today, my previously working setup (Mathematica 12.2.0.0 + GeForce GTX 1060 6GB on Windows 10) has stopped working after an update from the Wolfram servers replaced the MXNetLink and MXNetResources paclets I had before.

CUDA seems to work fine (CUDAQ[] gives True, InstallCUDA[] is successful, CUDADot[] example works but CUDAFourier doesn't, probably due to the unrelated issue), but I can't run any neural network functions at this point.

Any suggestions?

What happens for you with NetTrain with TargetDevice->"GPU", are there any error messages shown? I have a similar setup (Win 10, GTX 1060, Mathematica 12.2.0) and it appears to work. Are your graphics drivers up-to-date?

It all depends a somewhat on the order in which things are done, but here's some more information:

Running

<< CUDALink`
CUDAQ[]

gives True, then CudaInformation[] recognizes the GPU

{1 -> {"Name" -> "GeForce GTX 1060 6GB", "Clock Rate" -> 1708500,
"Compute Capabilities" -> 6.1, "GPU Overlap" -> 1,
"Maximum Block Dimensions" -> {1024, 1024, 64},
"Maximum Grid Dimensions" -> {2147483647, 65535, 65535},
"Maximum Threads Per Block" -> 1024,
"Maximum Shared Memory Per Block" -> 49152,
"Total Constant Memory" -> 65536, "Warp Size" -> 32,
"Maximum Pitch" -> 2147483647,
"Maximum Registers Per Block" -> 65536, "Texture Alignment" -> 512,
"Multiprocessor Count" -> 10, "Core Count" -> 1280,
"Execution Timeout" -> 1, "Integrated" -> False,
"Can Map Host Memory" -> True, "Compute Mode" -> "Default",
"Texture1D Width" -> 131072, "Texture2D Width" -> 131072,
"Texture2D Height" -> 65536, "Texture3D Width" -> 16384,
"Texture3D Height" -> 16384, "Texture3D Depth" -> 16384,
"Texture2D Array Width" -> 32768,
"Texture2D Array Height" -> 32768,
"Texture2D Array Slices" -> 2048, "Surface Alignment" -> 512,
"Concurrent Kernels" -> True, "ECC Enabled" -> False,
"TCC Enabled" -> False, "Total Memory" -> 6442450944}}

CUDADriverVersion[] identifies my drivers as version 465.21, which is correct (note that these are beta drivers needed to run Docker with GPU support on WSL 2, as explained here https://www.docker.com/blog/wsl-2-gpu-support-is-here/ ; they were working just fine with Mathematica up until yesterday).

After doing all of this, NetTrain trains fine on CPU but fails on GPU with the error

NetTrain::badtrgdevgpu: TargetDevice -> GPU could not be used. Please ensure that you have a compatible NVIDIA graphics card and have installed the latest drivers from http://www.nvidia.com/Download/index.aspx.

Strangely enough, if I now try CUDAQ[] I get a pop-up error box saying (note the DLL file is certainly there):

The procedure entry point cufftloadwisdom could not be located in the dynamic link library C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\cufftw64_10.dll.

Then afterwards, even when running neural network functions on the CPU, for example before NetTrain I get (again, the DLL file is present)

LibraryFunction::load: The library C:\Users\fidel\AppData\Roaming\Mathematica\Paclets\Repository\MXNetResources-WIN64-12.2.404\LibraryResources\Windows-x86-64\cublas64_11.dll cannot be loaded.

and also a pop-up error box saying

The procedure entry point cublasLtZZZMatmulAlgoGetHeuristic could not be located in the dynamic link library C:\Users\fidel\AppData\Roaming\Mathematica\Paclets\Repository\MXNetResources-WIN64-12.2.404\LibraryResources\Windows-x86-64\cublas64_11.dll

The update we pushed (MXNetLink and MXNetResources) updated the NetTrain GPU implementation to CUDA Toolkit 11.2, which was necessary to add support for the latest-generation Nvidia cards. However, it appears that CUDA Toolkit 11.2 is incompatible with the beta driver you're using, 465.21.

It appears Nvidia released an update yesterday that should help you: https://forums.developer.nvidia.com/t/new-cuda-on-wsl2-wip-driver-465-42-is-now-available-for-download/167166

We are pleased to inform you that WSL2 WIP driver 465.42 is now available for download. CUDA 11.2 toolkit will be functional for WSL v2 with this release. The soul of this driver release are some performance improvements we have made. Please let us know below what you think!

Thanks for the driver update suggestion! Mathematica is now back to working with the GPU, after I:

1) Uninstalled CUDA Toolkit v11.0 and installed v11.2 2) Updated the driver to v465.42 3) Uninstalled and reinstalled latest Mathematica paclets for MXNetLink and MXNResources (not sure how much this is necessary, I did it only because I was trying before to roll-back the updates)

Note that skipping step (1) above and keeping CUDA Toolkit 11.0 resulted in a working set-up, but I still got an error when running neural net computations after loading CUDALink (or the other way around). I'm guessing this was due to CUDALink and MXNLink being confused about toolkit versions, so I just updated everything to run on v11.2... but note that TensorFlow doesn't play nicely with v11.2, so this required extra tinkering on the side.

On the iMACs running Windows 10 and Mathematica 12.2 with CUDA tool 11.2 installed, we have:

 Needs["CUDALink`"]
 CUDAInformation[]

 {1->{Name->GeForce GTX 660M,Clock Rate->950000,Compute Capabilities->3.,GPU Overlap->1,Maximum Block Dimensions->{1024,1024,64},Maximum Grid Dimensions->{2147483647,65535,65535},Maximum Threads Per Block->1024,Maximum Shared Memory Per Block->49152,Total Constant Memory->65536,Warp Size->32,Maximum Pitch->2147483647,Maximum Registers Per Block->65536,Texture Alignment->512,Multiprocessor Count->2,Core Count->384,Execution Timeout->1,Integrated->False,Can Map Host Memory->True,Compute Mode->Default,Texture1D Width->65536,Texture2D Width->65536,Texture2D Height->65536,Texture3D Width->4096,Texture3D Height->4096,Texture3D Depth->4096,Texture2D Array Width->16384,Texture2D Array Height->16384,Texture2D Array Slices->2048,Surface Alignment->512,Concurrent Kernels->True,ECC Enabled->False,TCC Enabled->False,Total Memory->536870912}}

 InstallCUDA[]
 Success[]

Then:

 CUDADot[Table[i,{i,10},{j,10}],Table[i,{i,10},{j,10}]]//MatrixForm

 CUDADot::notinit: CUDALink is not initialized.

NB: CUDA no longer works on older iMACs as the graphics card support for the GTX 660 is limited to CUDA ver 10.1 by NVIDIA.

You would have to revert to Mathematica V11.x to run GPU functionality on one of these cards.

I just installed Cuda from Nvidia on my Windows 10 Lenovo X1 no problem:

 In[1]:= Needs["CUDALink`"]

In[2]:= CUDAInformation[]

Out[2]= {1 -> {"Name" -> "GeForce GTX 1650 with Max-Q Design", 
   "Clock Rate" -> 1245000, "Compute Capabilities" -> 7.5, 
   "GPU Overlap" -> 1, "Maximum Block Dimensions" -> {1024, 1024, 64},
    "Maximum Grid Dimensions" -> {2147483647, 65535, 65535}, 
   "Maximum Threads Per Block" -> 1024, 
   "Maximum Shared Memory Per Block" -> 49152, 
   "Total Constant Memory" -> 65536, "Warp Size" -> 32, 
   "Maximum Pitch" -> 2147483647, 
   "Maximum Registers Per Block" -> 65536, "Texture Alignment" -> 512,
    "Multiprocessor Count" -> 16, "Core Count" -> 1024, 
   "Execution Timeout" -> 1, "Integrated" -> False, 
   "Can Map Host Memory" -> True, "Compute Mode" -> "Default", 
   "Texture1D Width" -> 131072, "Texture2D Width" -> 131072, 
   "Texture2D Height" -> 65536, "Texture3D Width" -> 16384, 
   "Texture3D Height" -> 16384, "Texture3D Depth" -> 16384, 
   "Texture2D Array Width" -> 32768, 
   "Texture2D Array Height" -> 32768, 
   "Texture2D Array Slices" -> 2048, "Surface Alignment" -> 512, 
   "Concurrent Kernels" -> True, "ECC Enabled" -> False, 
   "TCC Enabled" -> False, "Total Memory" -> 4294967296}}

In[3]:= InstallCUDA[]

Out[3]= Success["CUDALinkLoaded", 
Association[
 "MessageTemplate" :> "CUDALink installation complete.", 
  "CUDAVersion" -> 11.2, 
  "DefaultDevice" -> "GeForce GTX 1650 with Max-Q Design", 
  "Toolkit" -> "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\
\\v11.2", 
  "NVCC" -> "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\nvcc.exe", 
  "LibrariesLoaded" -> {
   "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.2\\bin\
\\cublas64_11.dll", 
    "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\cublasLt64_11.dll", 
    "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\cudart64_110.dll", 
    "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\cufft64_10.dll", 
    "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\cufftw64_10.dll", 
    "C:\\Program Files\\NVIDIA GPU Computing \
Toolkit\\CUDA\\v11.2\\bin\\curand64_10.dll", 
    "C:\\Windows\\System32\\nvcuda.dll"}]]

In[4]:= CUDADot[Table[i, {i, 10}, {j, 10}], 
  Table[i, {i, 10}, {j, 10}]] // MatrixForm


Out[4]//MatrixForm= \!\(
TagBox[
RowBox[{"(", "", GridBox[{
{"55", "55", "55", "55", "55", "55", "55", "55", "55", "55"},
{"110", "110", "110", "110", "110", "110", "110", "110", "110", "110"},
{"165", "165", "165", "165", "165", "165", "165", "165", "165", "165"},
{"220", "220", "220", "220", "220", "220", "220", "220", "220", "220"},
{"275", "275", "275", "275", "275", "275", "275", "275", "275", "275"},
{"330", "330", "330", "330", "330", "330", "330", "330", "330", "330"},
{"385", "385", "385", "385", "385", "385", "385", "385", "385", "385"},
{"440", "440", "440", "440", "440", "440", "440", "440", "440", "440"},
{"495", "495", "495", "495", "495", "495", "495", "495", "495", "495"},
{"550", "550", "550", "550", "550", "550", "550", "550", "550", "550"}
},
GridBoxAlignment->{"Columns" -> {{Center}}, "Rows" -> {{Baseline}}},
GridBoxSpacings->{"Columns" -> {
Offset[0.27999999999999997`], {
Offset[0.7]}, 
Offset[0.27999999999999997`]}, "Rows" -> {
Offset[0.2], {
Offset[0.4]}, 
Offset[0.2]}}], "", ")"}],
Function[BoxForm`e$, 
MatrixForm[BoxForm`e$]]]\)
Posted 9 months ago

Hello, I am also facing a similar problem; I am using Mathematica 12.1 (can't use 12.2 at this moment as university procedure for update takes a while, so I have to work with 12.1), CUDA version 11.2, windows server 2019, visual studio 2019. I ran the following code:

Needs["CUDALink`"];
cudaActiveContourFunc = 
  CUDAFunctionLoad[src, 
   "CUDAActiveContour", {{ _Real, "Input"}, { _Real, 
     "Input"}, { _Real, "Input"}, { _Real, "Input"}, { _Real, 
     "Input"}, { _Real, "Input"}, { _Real, "Input"}, {_Real, 
     "Output"}, {_Real, 
     "Output"}, _Real, _Real, _Real, _Integer, _Integer, _Integer}, 
   256, "CompilerInstallation" -> 
    "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v" <> 
     ToString[DecimalForm[CUDAversion, {2, 1}]] <> "\\"];

gave error: CUDAFunctionLoad::invxpth: The "XCompilerInstallation" option set to $Failed is not valid. "XCompilerInstallation" must be a string pointing to the C compiler directory.

I ran the following commands and got the following output:

CUDAQ[]
o/p: true

CUDADriverVersion[]
o/p: 461.09

CUDAInformation[]
o/p: {1 -> {"Name" -> "GeForce GTX 1080 Ti", "Clock Rate" -> 1582000, 
   "Compute Capabilities" -> 6.1, "GPU Overlap" -> 1, 
   "Maximum Block Dimensions" -> {1024, 1024, 64}, 
   "Maximum Grid Dimensions" -> {2147483647, 65535, 65535}, 
   "Maximum Threads Per Block" -> 1024, 
   "Maximum Shared Memory Per Block" -> 49152, 
   "Total Constant Memory" -> 65536, "Warp Size" -> 32, 
   "Maximum Pitch" -> 2147483647, 
   "Maximum Registers Per Block" -> 65536, "Texture Alignment" -> 512,
    "Multiprocessor Count" -> 28, "Core Count" -> 3584, 
   "Execution Timeout" -> 1, "Integrated" -> False, 
   "Can Map Host Memory" -> True, "Compute Mode" -> "Default", 
   "Texture1D Width" -> 131072, "Texture2D Width" -> 131072, 
   "Texture2D Height" -> 65536, "Texture3D Width" -> 16384, 
   "Texture3D Height" -> 16384, "Texture3D Depth" -> 16384, 
   "Texture2D Array Width" -> 32768, 
   "Texture2D Array Height" -> 32768, 
   "Texture2D Array Slices" -> 2048, "Surface Alignment" -> 512, 
   "Concurrent Kernels" -> True, "ECC Enabled" -> False, 
   "TCC Enabled" -> False, "Total Memory" -> 11811160064}, 
 2 -> {"Name" -> "GeForce GTX 1080 Ti", "Clock Rate" -> 1582000, 
   "Compute Capabilities" -> 6.1, "GPU Overlap" -> 1, 
   "Maximum Block Dimensions" -> {1024, 1024, 64}, 
   "Maximum Grid Dimensions" -> {2147483647, 65535, 65535}, 
   "Maximum Threads Per Block" -> 1024, 
   "Maximum Shared Memory Per Block" -> 49152, 
   "Total Constant Memory" -> 65536, "Warp Size" -> 32, 
   "Maximum Pitch" -> 2147483647, 
   "Maximum Registers Per Block" -> 65536, "Texture Alignment" -> 512,
    "Multiprocessor Count" -> 28, "Core Count" -> 3584, 
   "Execution Timeout" -> 1, "Integrated" -> False, 
   "Can Map Host Memory" -> True, "Compute Mode" -> "Default", 
   "Texture1D Width" -> 131072, "Texture2D Width" -> 131072, 
   "Texture2D Height" -> 65536, "Texture3D Width" -> 16384, 
   "Texture3D Height" -> 16384, "Texture3D Depth" -> 16384, 
   "Texture2D Array Width" -> 32768, 
   "Texture2D Array Height" -> 32768, 
   "Texture2D Array Slices" -> 2048, "Surface Alignment" -> 512, 
   "Concurrent Kernels" -> True, "ECC Enabled" -> False, 
   "TCC Enabled" -> False, "Total Memory" -> 11811160064}, 
 3 -> {"Name" -> "Quadro K420", "Clock Rate" -> 0, 
   "Compute Capabilities" -> 3., "GPU Overlap" -> 1, 
   "Maximum Block Dimensions" -> {1024, 1024, 64}, 
   "Maximum Grid Dimensions" -> {2147483647, 65535, 65535}, 
   "Maximum Threads Per Block" -> 1024, 
   "Maximum Shared Memory Per Block" -> 49152, 
   "Total Constant Memory" -> 65536, "Warp Size" -> 32, 
   "Maximum Pitch" -> 2147483647, 
   "Maximum Registers Per Block" -> 65536, "Texture Alignment" -> 512,
    "Multiprocessor Count" -> 1, "Core Count" -> 192, 
   "Execution Timeout" -> 1, "Integrated" -> False, 
   "Can Map Host Memory" -> True, "Compute Mode" -> "Default", 
   "Texture1D Width" -> 65536, "Texture2D Width" -> 65536, 
   "Texture2D Height" -> 65536, "Texture3D Width" -> 4096, 
   "Texture3D Height" -> 4096, "Texture3D Depth" -> 4096, 
   "Texture2D Array Width" -> 16384, 
   "Texture2D Array Height" -> 16384, 
   "Texture2D Array Slices" -> 2048, "Surface Alignment" -> 512, 
   "Concurrent Kernels" -> True, "ECC Enabled" -> False, 
   "TCC Enabled" -> False, "Total Memory" -> 2147483648}}

CCompilers[Full]
o/p: {{"Name" -> "Intel Compiler", 
  "Compiler" -> CCompilerDriver`IntelCompiler`IntelCompiler, 
  "CompilerInstallation" -> None, 
  "CompilerName" -> Automatic}, {"Name" -> "Generic C Compiler", 
  "Compiler" -> CCompilerDriver`GenericCCompiler`GenericCCompiler, 
  "CompilerInstallation" -> None, 
  "CompilerName" -> Automatic}, {"Name" -> "NVIDIA CUDA Compiler", 
  "Compiler" -> NVCCCompiler, 
  "CompilerInstallation" -> 
   "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.2\\bin\
\\", "CompilerName" -> Automatic}}

cudaFun = 
 CUDAFunctionLoad[code, 
  "addTwo", {{_Integer, _, "Input"}, {_Integer, _, 
    "Output"}, _Integer}, 256]
o/p: 
CUDAFunctionLoad::invprog: CUDALink encountered an invalid program.
CUDAFunctionLoad[code, "addTwo", {{_Integer, _, 
   "Input"}, {_Integer, _, "Output"}, _Integer}, 256]

CUDAResourcesInformation[]
o/p: {{"Name" -> "CUDAResources", "Version" -> "12.1.0", 
  "WolframVersion" -> "12.1", "Qualifier" -> "Win64", 
  "SystemID" -> {"Windows-x86-64"}, 
  "Description" -> "{ToolkitVersion -> v10.2, MinimumDriver -> 290}", 
  "Category" -> Missing["NotAvailable"], 
  "Keywords" -> Missing["NotAvailable"], 
  "UUID" -> Missing["NotAvailable"], 
  "Creator" -> Missing["NotAvailable"], 
  "URL" -> Missing["NotAvailable"], "Internal" -> False, 
  "Context" -> {}, "Loading" -> Manual, "AutoUpdating" -> False, 
  "Enabled" -> True, 
  "Location" -> 
   "C:\\Users\\P70072599\\AppData\\Roaming\\Mathematica\\Paclets\\\
Repository\\CUDAResources-Win64-12.1.0", 
  "Hash" -> "3357678c60aa79e333fe04fbb5d04dd7"}}

CUDAResourcesInstall[]
o/p: {
PacletObject[
Association[
  "Name" -> "CUDAResources", "Version" -> "12.1.0", 
   "MathematicaVersion" -> "12.1", 
   "Description" -> "{ToolkitVersion -> v10.2, MinimumDriver -> 290}",
    "SystemID" -> {"Windows-x86-64"}, "Qualifier" -> "Win64", 
   "Extensions" -> {{
     "Resource", 
      "Resources" -> {
       "CUDAToolkit", "ExampleData", "LibraryResources"}}}, 
   "Location" -> "C:\\Users\\P70072599\\AppData\\Roaming\\Mathematica\
\\Paclets\\Repository\\CUDAResources-Win64-12.1.0"]]}


vec = Range[1., 10];
CUDAFourier[vec]
o/p:{17.3925 + 0. I, -1.58114 - 4.86624 I, -1.58114 - 
  2.17625 I, -1.58114 - 1.14876 I, -1.58114 - 0.513743 I, -1.58114 + 
  0. I, -1.58114 + 0.513743 I, -1.58114 + 1.14876 I, -1.58114 + 
  2.17625 I, -1.58114 + 4.86624 I}

InstallCUDA[]
o/p: InstallCUDA[]

CUDADot[Table[i, {i, 10}, {j, 10}], 
  Table[i, {i, 10}, {j, 10}]] // MatrixForm
o/p: {
 {55, 55, 55, 55, 55, 55, 55, 55, 55, 55},
 {110, 110, 110, 110, 110, 110, 110, 110, 110, 110},
 {165, 165, 165, 165, 165, 165, 165, 165, 165, 165},
 {220, 220, 220, 220, 220, 220, 220, 220, 220, 220},
 {275, 275, 275, 275, 275, 275, 275, 275, 275, 275},
 {330, 330, 330, 330, 330, 330, 330, 330, 330, 330},
 {385, 385, 385, 385, 385, 385, 385, 385, 385, 385},
 {440, 440, 440, 440, 440, 440, 440, 440, 440, 440},
 {495, 495, 495, 495, 495, 495, 495, 495, 495, 495},
 {550, 550, 550, 550, 550, 550, 550, 550, 550, 550}
}

Any help here would be very very helpful, as I am stuck here for quite sometime. Thank you! Looking forward !!

Your problem is separate from most of the discussion here, which is specific to changes and updates made in Mathematica 12.2.0. Please contact Wolfram Technical Support (https://www.wolfram.com/support/contact/email/?topic=technical), and include all of these details.

Posted 8 months ago

Stefan,

I updated the latest driver and toolkit, so I am now on 466.11 and 11.3, all latest paclets from Wolfram, NetTrain works.... kinda... 3090 core count isn't being recognized, max BatchSize is limited to 10000 batches, and when I run NetTrain, it shows my GPU utilization at 10-15% at all times. So I'm using a 10th of the CUDA power I have available. NetTrain on CPU correctly shows 98-99% utilization, though seems like 10k batches is still too small even for CPU, given the very fast RAM and PCIe4 bus speeds.
Most curiously, I'm showing +/-50% CPU utilization while NetTrain is targeted to use GPU! Why? Either way, 16s GPU time vs 19-20s CPU time is absolutely a TERRIBLE result, given the power of this GPU.

Attachment

Attachments:
Posted 8 months ago

I have a PC with a similar configuration to Gregory's: Aurora-Ryzen with GEForce 3090 GPU.

I get similar results for the same test - GPU utilization maxes out at around 10%-15% with CPU ultilzation at 50%.

It seems that Mathematica is seriously under-utilizing the GPU (and CPU), as Gregory reported.

Other mathematical programming languages I work with have no difficulty making full use for both CPU and GPU for tasks like NetTrain.

Someone at Wolfram needs to get a grip on GPU computing functionality and work on resolving incompatibility and performance issues that are a serious impediment to conducting ML R&D in Mathematica. I realize that some of these issues are outside Wolfram's control - for example when NVIDIA discontinues support for older GPUs - but there are ways to handle the issues much more effectively, as evidenced by the very comprehensive, up-to-date documentation offered by some competitor products.

The current situation of Mathematica CUDA support for recent NVIDIA GPUs is really terrible. Try to compare with recent MATLAB R2021a support, where overall CUDA performance on similar tasks (discussed here) is far more better.

Wolfram Research should really solve this situation as high priority task!!! Especially in a case of new Apple M1 HW, where the NVIDIA GPUs are completely discontinued.

Posted 7 months ago

FYI,

Further exploration revealed that some training tasks perform AMAZINGLY fast. Especially computer vision tasks where ConvolutionLayer is used. I am getting 3-5 seconds on GPU vs 3-4 MINUTES on my 3960X CPU for the MNIST example in the help file. Image classification example performs similarly well, with batch and round tuning to optimize performance time for similar quality resulting trained net. But as soon as we go back to simple math and vector multiplications, which should have used tensor cores - I suspect that's where NetTrain fails to understand how to use the GPU and switches back to the CPU utilization, hence the atrocious speeds and high CPU utilization numbers.

Gregory,

Do you want to post (a link to) some WL code that I can benchmark on my Ryzen 3090 machine, to confirm your findings?

Jonathan

Gregory

Assuming you are referring to the example below, I am unable to corroborate.

Its faster than CPU, but still takes over 1 minute to train. Task Manager reports minimal GPU load on my machine during the evaluation.

enter image description here

Posted 7 months ago

Jonathan,

depending on batch size, I've had the GPU crunch over 100k samples per second, but 85-90k is typical. 3-5s GPU time is typical, 5-6MIN CPU time is typical.

I'm running an EVGA RTX 3090 air cooled at 1995Mhz stable OC. My RAM is 3600Mhz, which probably also helps. There's about 35-37% CPU utilization still showing during those 3-4 seconds of GPU training.

-Greg

Attachment

Attachments:

You're right!

enter image description here

Here's another GPU fail, this time with AnomalyDetection. The error message may shed some light on the cause of the issue. It seems to be a missing dependency...

Are you able to replicate this Greg?

enter image description here

Posted 7 months ago

Jonathan,

Sorry for the late reply, I didn't see your comment and then got pretty busy. My PC succeeds with your code. See the attached image.

I noticed you're on CUDA 11.2. I'm on 11.3. Maybe that's it?

-Greg

Attachment

Attachment

Hi Greg,

No worries. Thanks for the heads-up on the 11.3 CUDA toolkit. The above now works on my machine also. However, it still isn't using GPU, even when you specify TargetDevice -> "GPU". It just reverts to using CPU, as in other cases.

Jonathan

CUDALink stopped working for me since around Mathematica 12 or so. I've recently upgraded to Windows 10 21H1 and Mathematica 12.3.1. CUDA Toolkit 10.2 was uninstalled and CUDA Toolkit 11.4 was installed. CUDALink still did not work. I found that the CUDA Toolkit 10.2 uninstall leaves the empty directory

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin

Once I deleted it, Needs["CUDALink`"] finds the

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin

directory and the CUDA functions like CUDAToolkitCompatibilityInformation[], CUDAQ[] (returns true), SystemInformation[] (shows driver and GPU status), CUDAInformation[], and CUDADriverVersion[] work as documented. I had also created and set CUDA_PATH to

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4

because CUDA Toolkit 11.4 creates CUDA_PATH_V11_4 instead. This did not seem to help Mathematica find the correct toolkit when Needs["CUDALink`"] is evaluated. It worked only when the empty v10.4\bin directory was removed.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract