Message Boards Message Boards

Avoid kernel crash while using CUDAQ[] with CUDALink on Linux Mint?

Posted 8 years ago

Dear all,

I have troubles installing/using CUDA on my linux Mint machine (Linux Mint 17 Cinnamon 64-bit). I have a NVIDIA Quadro K610M, and I have installed the NVidia CUDA drivers.

In Mathematica, several CUDA-related commands or checks give me the error :

General::cdir: Cannot set current directory to private. >>

and when I try the command CUDAQ[] (after Needs["CUDALink`"] of course), it crashes my kernel ! (the same being true when trying SystemInformation[] after loading CudaLink).

I post below more details - any hint to what the problem might be is welcome !

In[2]:= Needs["CUDALink`"]

In[3]:= CUDAResourcesInformation[]

Out[3]= {{"Name" -> "CUDAResources", "Version" -> "10.2.0.3", 
  "BuildNumber" -> "", "Qualifier" -> "LIN64", 
  "WolframVersion" -> "10.*", "SystemID" -> {"Linux-x86-64"}, 
  "Description" -> "{ToolkitVersion -> 7.5, MinimumDriver -> 300.0}", 
  "Category" -> "", "Creator" -> "", "Publisher" -> "", 
  "Support" -> "", "Internal" -> False, 
  "Location" -> 
   "/home/jonckhee/.Mathematica/Paclets/Repository/CUDAResources-\
LIN64-10.2.0.3", "Context" -> {}, "Enabled" -> True, 
  "Loading" -> Manual, "Hash" -> "da2008695ab87c5e2e072c552dc11217"}}

In[4]:= CUDADriverVersion[]

During evaluation of In[4]:= General::cdir: Cannot set current directory to private. >>

Out[4]= "370.23.0"

In[8]:= GPUTools`Internal`$NVIDIADriverLibraryPath

Out[8]= "/usr/lib/nvidia-370/libnvidia-tls.so.370.23"

In[7]:= GPUTools`Internal`$CUDALibraryPath

Out[7]= "/usr/lib/x86_64-linux-gnu/libcuda.so"

(the 2 paths, for NVIDIADriverLibray and for CUDALibrary, are correct : there are indeed files with these names at these locations)

Thibaut

2 Replies
Posted 8 years ago

I am a rookie both with Mathematica and CUDA, so take my experiences with Mathematica 11.0 on Ubuntu 16.04 with GTX1070 GPU base on that... also I have different Linux and GPU hardware than you have.

It appears to me there are at least three paths Wolfram language takes to interact with CUDA GPU operations:

1) Neural network operations, it appears that Functions like: NetTrain[net, trainingData, TargetDevice -> "GPU"] call to precompiled MXNET deep learning libraries (https://mxnet.readthedocs.io/en/latest/) directly ( I am not 100% sure I am correct on saying Wolfram uses MXNET http://mathematica.stackexchange.com/questions/125064/netencoders-vs-mxnet-preprocessing http://mathematica.stackexchange.com/questions/125028/memory-leak-and-cpu-issue-in-nettrain/125034#125034 ). My limited testing with the Nvidia GTX1070 card MMA 11.0 Ubuntu 16.04, these learning functions are not working correctly, they will run partially but do not yield result anywhere close to same NetTrain operation on "CPU".

2) Some Wolfram CUDA functions compile code at runtime using the NVCC compiler from Nvidia, my test here of a Wolfram example fails on the GTX1070 due to Mathematica using it own encapsulated CUDA SDK with as of yet support for the GTX1070 (CUDA 7.5 appears to be in Linux Mathematica, but some posts seem to show that Mathematica on Windows is using the CUDA 8 SDK (paclet update could change this I am guessing (another problem to watch out for)), as your post shows that does appear to be ways to call an external CUDA SDK, similar to Mathematica's ability to do same with R stats system as example. I have not tried modifying these options in Mathematica as yet. http://mathematica.stackexchange.com/questions/124818/help-with-cuda-nvcc-error-in-mandelbulb-example-mathematica-11-0-nvidia-gtx-1070 . The NVCC compiler will complain that it cannot compile to the version of CUDA functionality, 6.1, that the NVCC compiler 'see' in my library/hardware combo.

3) Some Wolfram CUDA functions seem to call precompiled CUDA libraries, either installed by Mathematica via it's paclet or installed via either the GPU driver or CUDA SDK. So far of this class of Mathematica CUDA function I have tested, they seem to operate correctly. I currently have CUDA 8 SDK, GTX1070 driver 367.44 install with Mathematica 11.0 on Ubuntu 16.04.

To your problem of the Mathematica kernel just simply exiting with no message on first CUDA function call, just the basic CUDAQ[] or CUDAResourcesInformation[], I experienced the same problem at one point in testing. I posted my experience in a email to Wolfram tech support, the answer was 'my configuration is not yet supported' Here is core of response (the answer does not exactly align with my above guess, but they could be wrong or my explanation to tech support could have been unclear:

'The neural network training functions do not use CUDALink in Mathematica. It will need external CUDA Toolkit to run the code. The recommended version of CUDA Toolkit is 7.5

I understand that for GTX 1070 it need an RC version of CUDA Toolkit 8.0. Unfortunately, because CUDA Toolkit 8.0 is not officially released, at this moment, we are not able to provide full support for this Toolkit.

I will forward this question to our developers asking more information regarding this case and also keep you update with any feedback from our developers.

Thank you again for bringing this issue to us and help us improve Mathematica.' [CASE:3697585]

I understand from some posts in StackExchange, Wolfram is in the process of purchasing a GTX1080 for testing.

Back to your problem and my similar experience with kernel exit, unfortunately due to my poor job at step duplication, or some other driver, SDK or paclet update I have missed, I am back to a system image of Mathematica 11.0, CUDA SDK 8RC, GPU driver 367.44 where the kernel no longer exits on CUDA function calls. So am not able to duplicate the kernel exit problem you are currently experiencing. The good news is there is a install combo that may work for you to at least advance a little further, for example to the three cases I see above :-) .

The CUDA environment is shifting quickly right now and with at least 4 or 5 moving (updating) components. Some you can control, one that appears more difficult to control, that being Mathematica updating it CUDA interfaces via the automatic and pretty 'black hole' paclet update process. I cannot say that during the multiple complete reinstalls from Ubuntu on down/up I have done that Mathematica has changed anything within the CUDA libraries it downloads, but I am not able to find any information on how to accurately check deltas in this code download.

Don't even get me started on what the NVIDIA drivers do behind the scenes to modify the Xserver environment with no notice. While, it appears to not be affecting the CUDA related operations, the fact that this is occurring without notice raises another channel of questions for me.

I am currently working to test MXNET and CUDA operation outside of Mathematica to understand the basic GTX1070 CPU environment function and stability. When it works, it is amazingly fast! So the direction is a worthy one based on price/performance.

Be very aware of the GPU driver version that you end up with, the CUDA SDK's (some) seem to install older drivers, sometime documented, sometimes not. Also the uninstall/reinstall of these drivers may leave different libraries and components around. When available I try to find CUDA SDK install options that do not touch existing drivers, but still check versions before and after various installs and updates.

Not sure if my babble helps any. Good hunting!

POSTED BY: David Proffer

Thanks a lot for the detailled information !

I was not expecting the problem to be simple - and your message and your experience confirm that Mathematica+Linux+Cuda is not a trivial combo. I will try to solve my problem step by step...

Thibaut

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract