Message Boards Message Boards

How-To-Guide: External GPU on OSX - how to use CUDA on your Mac

GROUPS:

The neural network and machine learning framework has become one of the key features of the latest releases of the Wolfram Language. Training neural networks can be very time consuming on a standard CPU. Luckily the Wolfram Language offers an incredible easy way to use a GPU to train networks - and do lots of other cool stuff. The problem with this was/is that most current Macs do not have an NVIDIA graphics card, which is necessary to access this framework within the Wolfram Language. Therefore, Wolfram Inc. had decided to drop support for GPUs on Macs. There is however a way to use GPUs on Macs. For example you can use an external GPU like the one offered by Bizon.

enter image description here

Apart from the BizonBox there a couple of cables and a power supply. You can buy/configure different versions of the BizonBox: there is a range of different graphics cards available and you can buy a the BizonBox 2s which basically connects via Thunderbolt and the BizonBox 3 which connects to USB-C.

Luckily, Wolfram have decided to reintroduce support for GPUs in Mathematica 11.1.1 - see the discussion here.

I have a variety of these BizonBoxes (both 2s and 3) and a range of Macs. I thought it would be a good idea to post a how-to. The essence of what I will be describing in this post should work for most Macs. I ran Sierra on all of them. Here is the recipe to get the thing to work:

Installation of the BizonBox, the required drivers, and compilers

  1. I will assume that you have Sierra installed and that Xcode is running. One of the really important steps if you want to use compilers is to downgrade the command line tools to version 7.3 You will have to log into your Apple Developer account and download the Command Line Tools version 7.3. Install the tools and run the terminal command (not in Mathematica!):

    sudo xcode-select  --switch /Library/Developer/CommandLineTools
    
  2. Reboot your Mac into safe mode, i.e. hold CMD+R while rebooting.

  3. Open a terminal (under item Utilities at the top of the screen).

  4. Enter

    csrutil disable 
    
  5. Shut the computer down.

  6. Connect your BizonBox to the mains and to either the thunderbolt or USB-C port of your Mac.

  7. Restart your Mac.

  8. Click on the Apple symbol in the top left. Then "About this Mac" and "System Report". In the Thunderbolt section you should see something like this:

enter image description here

  1. In the documentation of the BizonBox you will find a link to a program called bizonboxmac.zip. Download that file and unzip it.

  2. Open the folder and click on "bizonbox.prefPane" to install. (If prompted to, do update!)

  3. You should see this window:

enter image description here

  1. Click on Activate. Type in password if required to do so. It should give something like this:

enter image description here

Then restart.

  1. Install the CUDA Toolkit: https://developer.nvidia.com/cuda-downloads. You'll have to click through some questions for the download.

enter image description here

what you download should be something like cuda8.0.61mac.dmg and it should be more or less 1.44 GB worth.

  1. Install the toolkit with all its elements.

enter image description here

  1. Restart your computer.

First tests

Now you should be good to go. Open Mathematica 11.1.1. Execute

Needs["CUDALink`"]
Needs["CCompilerDriver`"]
CUDAResourcesInstall[]

Then try:

CUDAResourcesInformation[]

which should look somewhat like this:

enter image description here

Then you should check

SystemInformation[]

Head to Links and then CUDA.This should look similar to this:

enter image description here

So far so good. Next is the really crucial thing:

CUDAQ[]

should give TRUE. If that's what you see you are good to go. Be more daring and try

CUDAImageConvolve[ExampleData[{"TestImage","Lena"}], N[BoxMatrix[1]/9]] // AbsoluteTiming

enter image description here

You might notice that the non-GPU version of this command runs faster:

ImageConvolve[ExampleData[{"TestImage","Lena"}], N[BoxMatrix[1]/9]] // AbsoluteTiming

runs in something like 0.0824 seconds, but that's ok.

Benchmarking (training neural networks)

Let's do some Benchmarking. Download some example data:

obj = ResourceObject["CIFAR-10"]; 
trainingData = ResourceData[obj, "TrainingData"]; 
RandomSample[trainingData, 5]

You can check whether it worked:

RandomSample[trainingData, 5]

should give something like this:

enter image description here

These are the classes of the 50000 images:

classes = Union@Values[trainingData] 

enter image description here

Let's build a network

module = NetChain[{ConvolutionLayer[100, {3, 3}], 
   BatchNormalizationLayer[], ElementwiseLayer[Ramp], 
   PoolingLayer[{3, 3}, "PaddingSize" -> 1]}]

net = NetChain[{module, module, module, module, FlattenLayer[], 500, 
   Ramp, 10, SoftmaxLayer[]}, 
  "Input" -> NetEncoder[{"Image", {32, 32}}], 
  "Output" -> NetDecoder[{"Class", classes}]]

When you train the network:

{time, trained} = AbsoluteTiming@NetTrain[net, trainingData, Automatic, "TargetDevice" -> "GPU"];

you should see something like this:

enter image description here

So the thing started 45 secs ago and it supposed to finish in 2m54s. In fact, it finished after 3m30s. If we run the same on the CPU we get:

enter image description here

The estimate kept changing a bit, but it settled down at about 18h20m.That is slower by a factor of about 315, which is quite substantial.

Use of compiler

Up to now we have not needed the actual compiler. Let's try this, too. Let's grow a Mandelbulb:

width = 4*640;
height = 4*480;
iconfig = {width, height, 1, 0, 1, 6};
config = {0.001, 0.0, 0.0, 0.0, 8.0, 15.0, 10.0, 5.0};
camera = {{2.0, 2.0, 2.0}, {0.0, 0.0, 0.0}};
AppendTo[camera, Normalize[camera[[2]] - camera[[1]]]];
AppendTo[camera, 
  0.75*Normalize[Cross[camera[[3]], {0.0, 1.0, 0.0}]]];
AppendTo[camera, 0.75*Normalize[Cross[camera[[4]], camera[[3]]]]];
config = Join[{config, Flatten[camera]}];

pixelsMem = CUDAMemoryAllocate["Float", {height, width, 3}]

srcf = FileNameJoin[{$CUDALinkPath, "SupportFiles", "mandelbulb.cu"}]

Now this should work:

mandelbulb = 
CUDAFunctionLoad[File[srcf], "MandelbulbGPU", {{"Float", _, "Output"}, {"Float", _, "Input"}, {"Integer32", _, "Input"}, "Integer32", "Float", "Float"}, {16}, "UnmangleCode" -> False, "CompileOptions" -> "--Wno-deprecated-gpu-targets ", "ShellOutputFunction" -> Print]

Under certain circumstances you might want to specify the location of the compiler like so:

mandelbulb = 
 CUDAFunctionLoad[File[srcf], "MandelbulbGPU", {{"Float", _, "Output"}, {"Float", _, "Input"}, {"Integer32", _, "Input"}, "Integer32", "Float", 
"Float"}, {16}, "UnmangleCode" -> False, "CompileOptions" -> "--Wno-deprecated-gpu-targets ", "ShellOutputFunction" -> Print, 
"CompilerInstallation" -> "/Developer/NVIDIA/CUDA-8.0/bin/"]

This should give:

enter image description here

Now

mandelbulb[pixelsMem, Flatten[config], iconfig, 0, 0.0, 0.0, {width*height*3}];
pixels = CUDAMemoryGet[pixelsMem];
Image[pixels]

gives

enter image description here

So it appears that all is working fine.

Problems

I did come up with some problems though. There is quite a number of CUDA functions:

Names["CUDALink`*"]

enter image description here

Many work just fine.

res = RandomReal[1, 5000];
ListLinePlot[res]

enter image description here

ListLinePlot[First@CUDAImageConvolve[{res}, {GaussianMatrix[{{10}, 10}]}]]

enter image description here

The thing is that some don't and I am not sure why (I have a hypothesis though). Here are some functions that do not appear to work:

CUDAColorNegate CUDAClamp CUDAFold CUDAVolumetricRender CUDAFluidDynamics

and some more. I would be very grateful if someone could check these on OSX (and perhaps Windows?). I am not sure if the this is due to some particularity of my systems or something that could be flagged up to Wolfram Inc for checking.

When I wanted to try that systematically I wanted to use the function

WolframLanguageData

to look for the first example in the documentation of the CUDA functions, but it appears that no CUDA function is in the WolframLanguageData. I think tit would be great to have them there, too, and am not sure why they wouldn't be there.

In spite of these problems I hope that this post will help some Mac users to get CUDA going. It is a great framework and simple to use in the Wolfram Language. With the BizonBox and Mathematica 11.1.1 Mac users are no longer excluded from accessing this feature.

Cheers,

Marco

PS: Note, that there is anecdotal evidence that one can even use the BizonBox under Windows running in a virtual box under OSX. I don't have Windows, but I'd like to hear if anyone get this running.

POSTED BY: Marco Thiel
Answer
7 months ago

That looks really neat! I had no idea that there was such a large speed-up! Which GPU do you have inside your bizon box? nevermind I see it in the screenshot I'm thinking about buying one...

POSTED BY: Sander Huisman
Answer
7 months ago

Hi Sander,

yes, I've got the TitanX. I do not have comparative benchmarks with the other ones though.

For me it was definitely worth buying the boxes - and I am lucky that Wolfram reintroduced the support for them. I wouldn't say that I am particularly good at CUDA (quite the opposite), but I could make some code run substantially faster, which was really important for a project I have.

Note, that you can also buy the BizonBox without the GPU, so if you have a spare one flying around you can (most likely) use that one.

Cheers,

Marco

POSTED BY: Marco Thiel
Answer
7 months ago

Marco,

Awesome post! I was just looking into doing this.

What is the reason for downgrading the command line tools? If you do not downgrade can you still run the built in Neural net functions (without using the compiler)?

Thanks

POSTED BY: Neil Singer
Answer
7 months ago

Dear Neil,

the downgrading is strictly speaking not necessary if you only want the Wolfram Language's Machine Learning and functions that do not require compilation.

If you have the latest you see something like this,

enter image description here

but with "The Version ('80300')" or so. It is a warning that the compilation failed. It is not a Mathematica/WolframLanguage problem. If you followed the instructions in the OP you would have generated a folder

/Developer/NVIDIA/CUDA-8.0/samples/2_Graphics/Mandelbrot/

you could try to use "make" to compile and that will fail unless you have downgraded the command line tools. See also this discussion here.

The process needs the command line c-compilers and there is an incompatibility, I think.

Best wishes,

Marco

POSTED BY: Marco Thiel
Answer
7 months ago

Dear Marco, Thank you for a very informative post. You had responded to a question about downgrading the command line tools with:

"the downgrading is strictly speaking not necessary if you only want the Wolfram Language's Machine Learning and functions that do not require compilation."

I was wondering if you were had tried NetTrain without the downgrade of command line tools to 7.2. Thanks..Jan

POSTED BY: Jan Segert
Answer
5 months ago

Besides the Bizon Box (which comes with support), there are also a couple of other, cheaper DIY options available which have been reviewed on https://egpu.io/news/ For more eGPU benchmarks (not Mathematica) see http://barefeats.com.

POSTED BY: Arno Bosse
Answer
7 months ago

enter image description here - Congratulations! This post is now a Staff Pick! Thank you for your wonderful contributions. Please, keep them coming!

POSTED BY: Moderation Team
Answer
7 months ago

Dear Wolfram Team,

I am very glad and thankful that you reacted so quickly to the comments about GPU access on Macs. Having access to this framework opens up many possibilities in research and teaching. I appreciate it that you sorted this out so swiftly and efficiently.

Thank you,

Marco

POSTED BY: Marco Thiel
Answer
7 months ago

Has anyone set up an eGPU with Windows?

POSTED BY: Diego Zviovich
Answer
7 months ago

A good source of current information on eGPU's: http://barefeats.com/

POSTED BY: David Proffer
Answer
7 months ago

Thank you Marco! I used your instructions to get a BizonBox 2S successfully working on my MacBook Pro.

Some glitches I ran into that I'll mention in case they come up for others:

  1. I had trouble getting the Bizon to activate, and then to have the Nvidia control panel find the Bizon. Sometimes actions would work and other times not. After experiments and consultation with support, I replaced the long Thunderbolt cable supplied with the Bizon with a shorter one (three foot). That solved many of the problems.

  2. I still had difficulty getting the mac to see the box. Support said that the Mac should be off when the box is connected or disconnected. That helped.

  3. Finally I couldn't get cuda recognized as installed by the Nvidia app or Mm. The last thing I did before it worked was plug in a display to the Bizon - then everything started working. The display didn't stay plugged in, and I don't need it now, but it seemed that it needed to be there to initialize something.

All of this was with a lot of on/off, rebooting, trying different things so I'm not sure the above is necessary, but in the end it got mine working following your instructions.

Thanks again Marco - I'm not sure I would have stuck it out without knowing there was light at the end of the tunnel.

Mike

POSTED BY: Updating Name
Answer
5 months ago

Dear Mike,

thank you for your nice words. I am glad if some of what I wrote helped.

You are right that sometimes it takes a bit of rerunning bits several times, and some rebooting to make it work. On a "clean" Mac the instructions appeared to work, but after trying this on many Macs now, there is often some rebooting required. Also, when you run an update of the OS you might have to perform some of the instructions again.

On the bright side, I got the GPU to work on all Macs we have tried so far. The script on this page: https://github.com/goalque/automate-eGPU sometimes seemed to make a difference, particularly after an OS update. Also OSX regularly wants to update the downgraded Command Line Tools, which is a bit annoying.

Best wishes,

Marco

POSTED BY: Marco Thiel
Answer
5 months ago

Marco, thank you for this post. I assume the instructions will get a lot simpler once everyone switches to macOS High Sierra which natively supports eGPU's

https://9to5mac.com/2017/06/07/hands-on-macos-high-sierra-native-egpu-support-shows-promise-video/

POSTED BY: Eric Smith
Answer
5 months ago

Hi Eric,

yes, it sounds as if this might get easier. I suppose the problem with the Command Line Tools would persist though. As soon as I can get a final version of High Sierra, I will try it out and report back to this Community.

Cheers,

Marco

POSTED BY: Marco Thiel
Answer
5 months ago

Marco, thank you for all of this very usable information. Following these instructions I was easily able to get a similar setup working, the main difference is that my GPU is an NVIDIA GTX 1080 Ti. I'm posting some benchmarks for GPU comparison (and some details about the setup the end).

The network training task came in at about 2 minutes 17 seconds withthe 1080 Ti: NetTrain image

The ImageConvolve was still slower with CUDA, but not a lot slower:

ImageConvolve

I was not successful in growing a Mandelbulb, but I didn't put any real effort into trying to troubleshoot this.

Thanks again, I never would have attempted this had you not documented your setup... best wishes... Jan

PS: On a related note, Apple appears to be moving towards supporting external GPU's in the upcoming High Sierra OS release, but apparently only with computers that support Thunderbolt 3.

PPS: Some technical details:

Computer: Model Name: MacBook Pro Model Identifier: MacBookPro11,4 Processor Name: Intel Core i7 Processor Speed: 2.2 GHz

eGPU: I have a BizonBox 2S connected by Thunderbolt 2. I first downgraded the Command Line Tools. Then I followed the BizonBox instructions (including having an external monitor plugged into the GPU via HDMI). Then I followed Marco's instructions for the Wolfram setup. I only experienced one minor setback, which was that the CUDAResourcesInstall[] crashed the Wolfram kernel the first time I tried it, but worked fine after launching a new kernel.

POSTED BY: Jan Segert
Answer
5 months ago

Is it necessary to get CUDALink working and let CUDAResourcesInstall[] run if I only need to use TargetDevice -> "GPU" in NetTrain, but never any functions from the CUDALink package? Does NetTrain depend on CUDALink or are they separate?

POSTED BY: Szabolcs Horvát
Answer
3 months ago

Group Abstract Group Abstract