# [✓] Training loss goes to 0 when specifying TargetDevice = GPU

GROUPS:
 David Cardinal 4 Votes HI -- The neural network package is really cool. I'm learning a lot from experimenting with it. I have noticed one odd thing I can't explain. I have a fairly simple CNN that I train on some images. It trains (although super-slowly) on my CPU, with a more or less reasonable loss getting smaller on each training round. But when I set TargetDevice = "GPU" it instantly reports 0 loss and finishes in just a couple seconds. As one clue, I had an Nvidia 970 with the latest Nvidia drivers and this didn't happen. I just upgraded to a new 1080 (partially for faster network training!), and that is when this started happening. I've attached a Notebook that demonstrates this behavior and has SystemInformation[] in it. (It does use images that aren't included, but I don't think there is anything special about them, they are just a bunch of JPEGs). I'm running Nvidia driver 372.70. If there is a different driver I should be using instead, please let me know. Thanks! Attachments:
7 months ago
13 Replies
 Stefan Ragnarsson 2 Votes This might be a problem on NVIDIA's end, an interaction between the 1080/1070 cards and CUDAToolkit 7.5 and cuDNN library v5.0. We are investigating.
7 months ago
 Sebastian Bodenstein 2 Votes A number of TensorFlow users are reporting problems using a 1080/1070 with CUDA v7.5 and cuDNN v5.0 (which we are currently using for 11.0). The latest is CUDA v8RC and cuDNN v5.1: This seems to be resolved when upgrading to CUDA 8RC and cuDNN 5.1. But we are in the process of acquiring our own 1080 GPU's and will verify that this fixes the problem soon. Will keep you posted on what we find.
7 months ago
 Thanks for the quick reply. I assume that there isn't any way I can update the runtime libraries for CUDA & cuDNN that Mathematica uses on my system, but that I need to wait for a patch from you guys?
7 months ago
 We build other libraries against CUDA 7.5, which would need to be rebuilt. So we would need to push a patch.
7 months ago
 Any estimate of when this might get patched? Training on the CPU is, of course, almost useless, so I'd love to be using my 1080. Thanks for any info!
6 months ago
 Any estimate of when this might get patched? We were hoping to rebuild the 11.0 release backend with CUDA 8.0 Release Candidate, and provide a patch. Unfortunately, it does not appear to be compatible with CUDA 8.0 RC, so we can't go this simple route. So it looks like you will need to wait for 11.1, which will definitely support CUDA 8.0.
6 months ago
 Sebastian -- Thanks for the prompt reply, although "ouch" on the timing.
6 months ago
 Hi-- I experience the same problem with the Titan X GPU card from NVidia which I bought for the Deep Learning toolkit of Mathematica.The Mathematica Deep Learning toolkit is so much easier to use compared to Caffe, Theano or TensorFlow in combination with Python.But without a reasonable GPU, the Deep Learning toolkit it is too slow for solving applications. Is there a prediction when Mathematica 11.1 is about to be released?Can I subscribe to a beta release of RC of 11.1?Which is the fast graphics/GPU card that the Mathematica Toolkit will run the Deep learning functions?
5 months ago
 Fred -- I doubt it is the fastest card that works, but I've fallen back to a 970 until the bug that prevents my 1080 from working is fixed. It works okay, but of course is both slower and has less memory.
5 months ago
 David-- thanks helps me out for the momen., I'll try a 4GB Tesla K10 from a colleague and if it doesn't have enough memory I'll buy the GTX 970 with 8GB
 Great! NeuralNetworks in general and NetTrain in particular got a huge overhaul on nearly every level in 11.1, so many of the bugs and limitations in 11.0.1 should be fixed, or at least ameliorated.