Message Boards Message Boards

GROUPS:

How-To-Guide: External GPU on OSX - how to use CUDA on your Mac

Posted 3 years ago
26234 Views
|
31 Replies
|
80 Total Likes
|

The neural network and machine learning framework has become one of the key features of the latest releases of the Wolfram Language. Training neural networks can be very time consuming on a standard CPU. Luckily the Wolfram Language offers an incredible easy way to use a GPU to train networks - and do lots of other cool stuff. The problem with this was/is that most current Macs do not have an NVIDIA graphics card, which is necessary to access this framework within the Wolfram Language. Therefore, Wolfram Inc. had decided to drop support for GPUs on Macs. There is however a way to use GPUs on Macs. For example you can use an external GPU like the one offered by Bizon.

enter image description here

Apart from the BizonBox there a couple of cables and a power supply. You can buy/configure different versions of the BizonBox: there is a range of different graphics cards available and you can buy a the BizonBox 2s which basically connects via Thunderbolt and the BizonBox 3 which connects to USB-C.

Luckily, Wolfram have decided to reintroduce support for GPUs in Mathematica 11.1.1 - see the discussion here.

I have a variety of these BizonBoxes (both 2s and 3) and a range of Macs. I thought it would be a good idea to post a how-to. The essence of what I will be describing in this post should work for most Macs. I ran Sierra on all of them. Here is the recipe to get the thing to work:

Installation of the BizonBox, the required drivers, and compilers

  1. I will assume that you have Sierra installed and that Xcode is running. One of the really important steps if you want to use compilers is to downgrade the command line tools to version 7.3 You will have to log into your Apple Developer account and download the Command Line Tools version 7.3. Install the tools and run the terminal command (not in Mathematica!):

    sudo xcode-select  --switch /Library/Developer/CommandLineTools
    
  2. Reboot your Mac into safe mode, i.e. hold CMD+R while rebooting.

  3. Open a terminal (under item Utilities at the top of the screen).

  4. Enter

    csrutil disable 
    
  5. Shut the computer down.

  6. Connect your BizonBox to the mains and to either the thunderbolt or USB-C port of your Mac.

  7. Restart your Mac.

  8. Click on the Apple symbol in the top left. Then "About this Mac" and "System Report". In the Thunderbolt section you should see something like this:

enter image description here

  1. In the documentation of the BizonBox you will find a link to a program called bizonboxmac.zip. Download that file and unzip it.

  2. Open the folder and click on "bizonbox.prefPane" to install. (If prompted to, do update!)

  3. You should see this window:

enter image description here

  1. Click on Activate. Type in password if required to do so. It should give something like this:

enter image description here

Then restart.

  1. Install the CUDA Toolkit: https://developer.nvidia.com/cuda-downloads. You'll have to click through some questions for the download.

enter image description here

what you download should be something like cuda8.0.61mac.dmg and it should be more or less 1.44 GB worth.

  1. Install the toolkit with all its elements.

enter image description here

  1. Restart your computer.

First tests

Now you should be good to go. Open Mathematica 11.1.1. Execute

Needs["CUDALink`"]
Needs["CCompilerDriver`"]
CUDAResourcesInstall[]

Then try:

CUDAResourcesInformation[]

which should look somewhat like this:

enter image description here

Then you should check

SystemInformation[]

Head to Links and then CUDA.This should look similar to this:

enter image description here

So far so good. Next is the really crucial thing:

CUDAQ[]

should give TRUE. If that's what you see you are good to go. Be more daring and try

CUDAImageConvolve[ExampleData[{"TestImage","Lena"}], N[BoxMatrix[1]/9]] // AbsoluteTiming

enter image description here

You might notice that the non-GPU version of this command runs faster:

ImageConvolve[ExampleData[{"TestImage","Lena"}], N[BoxMatrix[1]/9]] // AbsoluteTiming

runs in something like 0.0824 seconds, but that's ok.

Benchmarking (training neural networks)

Let's do some Benchmarking. Download some example data:

obj = ResourceObject["CIFAR-10"]; 
trainingData = ResourceData[obj, "TrainingData"]; 
RandomSample[trainingData, 5]

You can check whether it worked:

RandomSample[trainingData, 5]

should give something like this:

enter image description here

These are the classes of the 50000 images:

classes = Union@Values[trainingData] 

enter image description here

Let's build a network

module = NetChain[{ConvolutionLayer[100, {3, 3}], 
   BatchNormalizationLayer[], ElementwiseLayer[Ramp], 
   PoolingLayer[{3, 3}, "PaddingSize" -> 1]}]

net = NetChain[{module, module, module, module, FlattenLayer[], 500, 
   Ramp, 10, SoftmaxLayer[]}, 
  "Input" -> NetEncoder[{"Image", {32, 32}}], 
  "Output" -> NetDecoder[{"Class", classes}]]

When you train the network:

{time, trained} = AbsoluteTiming@NetTrain[net, trainingData, Automatic, "TargetDevice" -> "GPU"];

you should see something like this:

enter image description here

So the thing started 45 secs ago and it supposed to finish in 2m54s. In fact, it finished after 3m30s. If we run the same on the CPU we get:

enter image description here

The estimate kept changing a bit, but it settled down at about 18h20m.That is slower by a factor of about 315, which is quite substantial.

Use of compiler

Up to now we have not needed the actual compiler. Let's try this, too. Let's grow a Mandelbulb:

width = 4*640;
height = 4*480;
iconfig = {width, height, 1, 0, 1, 6};
config = {0.001, 0.0, 0.0, 0.0, 8.0, 15.0, 10.0, 5.0};
camera = {{2.0, 2.0, 2.0}, {0.0, 0.0, 0.0}};
AppendTo[camera, Normalize[camera[[2]] - camera[[1]]]];
AppendTo[camera, 
  0.75*Normalize[Cross[camera[[3]], {0.0, 1.0, 0.0}]]];
AppendTo[camera, 0.75*Normalize[Cross[camera[[4]], camera[[3]]]]];
config = Join[{config, Flatten[camera]}];

pixelsMem = CUDAMemoryAllocate["Float", {height, width, 3}]

srcf = FileNameJoin[{$CUDALinkPath, "SupportFiles", "mandelbulb.cu"}]

Now this should work:

mandelbulb = 
CUDAFunctionLoad[File[srcf], "MandelbulbGPU", {{"Float", _, "Output"}, {"Float", _, "Input"}, {"Integer32", _, "Input"}, "Integer32", "Float", "Float"}, {16}, "UnmangleCode" -> False, "CompileOptions" -> "--Wno-deprecated-gpu-targets ", "ShellOutputFunction" -> Print]

Under certain circumstances you might want to specify the location of the compiler like so:

mandelbulb = 
 CUDAFunctionLoad[File[srcf], "MandelbulbGPU", {{"Float", _, "Output"}, {"Float", _, "Input"}, {"Integer32", _, "Input"}, "Integer32", "Float", 
"Float"}, {16}, "UnmangleCode" -> False, "CompileOptions" -> "--Wno-deprecated-gpu-targets ", "ShellOutputFunction" -> Print, 
"CompilerInstallation" -> "/Developer/NVIDIA/CUDA-8.0/bin/"]

This should give:

enter image description here

Now

mandelbulb[pixelsMem, Flatten[config], iconfig, 0, 0.0, 0.0, {width*height*3}];
pixels = CUDAMemoryGet[pixelsMem];
Image[pixels]

gives

enter image description here

So it appears that all is working fine.

Problems

I did come up with some problems though. There is quite a number of CUDA functions:

Names["CUDALink`*"]

enter image description here

Many work just fine.

res = RandomReal[1, 5000];
ListLinePlot[res]

enter image description here

ListLinePlot[First@CUDAImageConvolve[{res}, {GaussianMatrix[{{10}, 10}]}]]

enter image description here

The thing is that some don't and I am not sure why (I have a hypothesis though). Here are some functions that do not appear to work:

CUDAColorNegate CUDAClamp CUDAFold CUDAVolumetricRender CUDAFluidDynamics

and some more. I would be very grateful if someone could check these on OSX (and perhaps Windows?). I am not sure if the this is due to some particularity of my systems or something that could be flagged up to Wolfram Inc for checking.

When I wanted to try that systematically I wanted to use the function

WolframLanguageData

to look for the first example in the documentation of the CUDA functions, but it appears that no CUDA function is in the WolframLanguageData. I think tit would be great to have them there, too, and am not sure why they wouldn't be there.

In spite of these problems I hope that this post will help some Mac users to get CUDA going. It is a great framework and simple to use in the Wolfram Language. With the BizonBox and Mathematica 11.1.1 Mac users are no longer excluded from accessing this feature.

Cheers,

Marco

PS: Note, that there is anecdotal evidence that one can even use the BizonBox under Windows running in a virtual box under OSX. I don't have Windows, but I'd like to hear if anyone get this running.

31 Replies

That looks really neat! I had no idea that there was such a large speed-up! Which GPU do you have inside your bizon box? nevermind I see it in the screenshot I'm thinking about buying one...

Hi Sander,

Yes, I've got the TitanX. I do not have comparative benchmarks with the other ones though.

For me it was definitely worth buying the boxes - and I am lucky that Wolfram reintroduced the support for them. I wouldn't say that I am particularly good at CUDA (quite the opposite), but I could make some code run substantially faster, which was really important for a project I have.

Note, that you can also buy the BizonBox without the GPU, so if you have a spare one flying around you can (most likely) use that one.

Cheers,

Marco

Marco,

Awesome post! I was just looking into doing this.

What is the reason for downgrading the command line tools? If you do not downgrade can you still run the built in Neural net functions (without using the compiler)?

Thanks

Dear Neil,

The downgrading is strictly speaking not necessary if you only want the Wolfram Language's Machine Learning and functions that do not require compilation.

If you have the latest you see something like this,

enter image description here

but with "The Version ('80300')" or so. It is a warning that the compilation failed. It is not a Mathematica/WolframLanguage problem. If you followed the instructions in the OP you would have generated a folder

/Developer/NVIDIA/CUDA-8.0/samples/2_Graphics/Mandelbrot/

you could try to use "make" to compile and that will fail unless you have downgraded the command line tools. See also this discussion here.

The process needs the command line c-compilers and there is an incompatibility, I think.

Best wishes,

Marco

Dear Marco, Thank you for a very informative post. You had responded to a question about downgrading the command line tools with:

"the downgrading is strictly speaking not necessary if you only want the Wolfram Language's Machine Learning and functions that do not require compilation."

I was wondering if you were had tried NetTrain without the downgrade of command line tools to 7.2. Thanks..Jan

Besides the Bizon Box (which comes with support), there are also a couple of other, cheaper DIY options available which have been reviewed on https://egpu.io/news/ For more eGPU benchmarks (not Mathematica) see http://barefeats.com.

enter image description here - Congratulations! This post is now a Staff Pick! Thank you for your wonderful contributions. Please, keep them coming!

Dear Wolfram Team,

I am very glad and thankful that you reacted so quickly to the comments about GPU access on Macs. Having access to this framework opens up many possibilities in research and teaching. I appreciate it that you sorted this out so swiftly and efficiently.

Thank you,

Marco

Posted 3 years ago

Has anyone set up an eGPU with Windows?

Posted 3 years ago

A good source of current information on eGPU's: http://barefeats.com/

Posted 3 years ago

Thank you Marco! I used your instructions to get a BizonBox 2S successfully working on my MacBook Pro.

Some glitches I ran into that I'll mention in case they come up for others:

  1. I had trouble getting the Bizon to activate, and then to have the Nvidia control panel find the Bizon. Sometimes actions would work and other times not. After experiments and consultation with support, I replaced the long Thunderbolt cable supplied with the Bizon with a shorter one (three foot). That solved many of the problems.

  2. I still had difficulty getting the mac to see the box. Support said that the Mac should be off when the box is connected or disconnected. That helped.

  3. Finally I couldn't get cuda recognized as installed by the Nvidia app or Mm. The last thing I did before it worked was plug in a display to the Bizon - then everything started working. The display didn't stay plugged in, and I don't need it now, but it seemed that it needed to be there to initialize something.

All of this was with a lot of on/off, rebooting, trying different things so I'm not sure the above is necessary, but in the end it got mine working following your instructions.

Thanks again Marco - I'm not sure I would have stuck it out without knowing there was light at the end of the tunnel.

Mike

Dear Mike,

thank you for your nice words. I am glad if some of what I wrote helped.

You are right that sometimes it takes a bit of rerunning bits several times, and some rebooting to make it work. On a "clean" Mac the instructions appeared to work, but after trying this on many Macs now, there is often some rebooting required. Also, when you run an update of the OS you might have to perform some of the instructions again.

On the bright side, I got the GPU to work on all Macs we have tried so far. The script on this page: https://github.com/goalque/automate-eGPU sometimes seemed to make a difference, particularly after an OS update. Also OSX regularly wants to update the downgraded Command Line Tools, which is a bit annoying.

Best wishes,

Marco

Marco, thank you for this post. I assume the instructions will get a lot simpler once everyone switches to macOS High Sierra which natively supports eGPU's

https://9to5mac.com/2017/06/07/hands-on-macos-high-sierra-native-egpu-support-shows-promise-video/

Hi Eric,

yes, it sounds as if this might get easier. I suppose the problem with the Command Line Tools would persist though. As soon as I can get a final version of High Sierra, I will try it out and report back to this Community.

Cheers,

Marco

Marco, thank you for all of this very usable information. Following these instructions I was easily able to get a similar setup working, the main difference is that my GPU is an NVIDIA GTX 1080 Ti. I'm posting some benchmarks for GPU comparison (and some details about the setup the end).

The network training task came in at about 2 minutes 17 seconds withthe 1080 Ti: NetTrain image

The ImageConvolve was still slower with CUDA, but not a lot slower:

ImageConvolve

I was not successful in growing a Mandelbulb, but I didn't put any real effort into trying to troubleshoot this.

Thanks again, I never would have attempted this had you not documented your setup... best wishes... Jan

PS: On a related note, Apple appears to be moving towards supporting external GPU's in the upcoming High Sierra OS release, but apparently only with computers that support Thunderbolt 3.

PPS: Some technical details:

Computer: Model Name: MacBook Pro Model Identifier: MacBookPro11,4 Processor Name: Intel Core i7 Processor Speed: 2.2 GHz

eGPU: I have a BizonBox 2S connected by Thunderbolt 2. I first downgraded the Command Line Tools. Then I followed the BizonBox instructions (including having an external monitor plugged into the GPU via HDMI). Then I followed Marco's instructions for the Wolfram setup. I only experienced one minor setback, which was that the CUDAResourcesInstall[] crashed the Wolfram kernel the first time I tried it, but worked fine after launching a new kernel.

Is it necessary to get CUDALink working and let CUDAResourcesInstall[] run if I only need to use TargetDevice -> "GPU" in NetTrain, but never any functions from the CUDALink package? Does NetTrain depend on CUDALink or are they separate?

Posted 1 year ago

Set the environment variables LDLIBRARYPATH, LIBRARY_PATH and CPATH to the directory extracted from the download.

If needed, separate multiple directories with : as in the PATH environment variable.

export CUDNN_ROOT=/home/YourUserName/libs/cudnn
export LD_LIBRARY_PATH=$CUDNN_ROOT/lib64:$LD_LIBRARY_PATH
export CPATH=$CUDNN_ROOT/include:$CPATH
export LIBRARY_PATH=$CUDNN_ROOT/lib64:$LD_LIBRARY_PATH

To uninstall CUDA, run:

/usr/local/cuda/bin/uninstallxxx

Since Nvidia drivers are no longer supported in MacOS 10.15, I assume this no longer works, or am I mistaken?

Dear Mike,

it is true that the newer versions of OSX do not support Nvidia drivers. I do run GPUs only under older versions of OSX. I have had to downgrade one computer to make it work, which is a pain because of a new chip which makes downgrading difficult.

I also use GPUs on OSX extensively for teaching.

At the end it works, as long as you can work on older system (or dual boot etc).

Best wishes,

Marco

Best wishes, Marco

Posted 4 months ago

Given the Mac Pro and the latest powerful MacBook pros, etc. Wolfram should implement Metal support so that we can get GPU acceleration even on modern hardware without NVIDIA GPUs... Any words on this from Wolfram? They also advertise OpenCL compatibility, that framework was an open source effort from Apple to abstract away CUDA so that it would also work on non-NVIDIA gpus.. I guess it's Metal now. The documentation at https://reference.wolfram.com/language/OpenCLLink/tutorial/Setup.html#10897850 looks outdated. Given that Apple will likely add/switch to ARM processors even for Macs in a year or two, I hope Wolfram gets up to speed on adopting Metal asap.

I agree 100%.

Apple, which has a lot more resources than Wolfram, has already done the heavy lifting in abstracting the hardware for the GPUs. Further, I doubt that the open-source framework that Wolfram is using for GPU acceleration of neural networks will ever escape the thumb of NVIDIA.

I realize that macOS users are a minor component of Mathematica users, but they are a significant minority. If you add in iOS users (I hope that a native Mathematica for iOS will eventually be released), the number of people who could benefit by fully integrating Metal into Mathematica would be significant.

Wolfram has already made a good start by using Metal for graphics. It is time to take the next step.

Mathematica 12.1 does support Metal. From John Fultz's WTC 2019 presentation:

Metal for macOS (upgraded from OpenGL 4.1)

Right. But this is Metal for graphics rendering, not Metal for Machine Learning, etc. Apple expanded the domain for their 'Metal' technology. So, although graphics rendering is much improved, Mathematica does not make use of any of the neural networks stuff.

Posted 4 months ago

Not only that but Wolfram could use Metal (so GPUs) for a lot of parallel processing functions as well (like Parallelize, ParallelEvaluate, etc.)... Right now it looks like Mathematica's parallelism is only about its kernel running on CPU cores.

Posted 4 months ago

Indeed and the iPad Pro today is already more powerful than quite a few notebooks... I don't think Mac users are a minority for Wolfram, it's a substantial percentage of the userbase. Even Mathematica 1 was released for the Mac first.

Currently Tensorflow and other popular machine learning frameworks do not support Metal (aside from Apple's CoreML). This means that in Jupyter notebooks for example you are forced to use the cloud to compute using a GPU. Given that Wolfram wants to increase the usage of its language in data science, Metal support should be a no brainer so I'm surprised Wolfram hasn't prioritized this.

I've read all the threads in this lengthy post, and more replies keep coming in. This topic is very important, so I wanted to summarize it with a conclusion for all my fellow machine learners (and devs please correct me if I'm missing anything).

Here's the bottom line: do not buy an e-GPU if you own a modern (2019+) mac.

If you want to use NetTrain[..., TargetDevice -> "GPU"], then you have only 3 options:

  1. Buy a Linux box with an Nvidia GPU (pricey)

  2. Email WRI and ask them to add the ROCm version of mxnet as an alternative NetTrain backend, which currently runs very nicely on non-Nvidia GPUs! This shouldn't be too hard since the MXNetLink` APIs are pretty clean. (unlikely)

  3. Email WRI about upgrading the frontend to support remote (preemptive) mathlinks - this would allow NetTrain to use remote GPUs in locally running notebooks. Currently, running local notebook with dynamics listening to a remote kernel is possible, but extremely unstable! (more unlikely)

Notes:

  • Yes, options 2 and 3 are unlikely, but it will help if you do them anyway :)

  • You could, of course, upload a "wolfram-script" file and data to a (p2/p3-class) ec2 machine and run it remotely in the command line.... but really, don't do this. You would be giving up all the reasons (dynamic ergonomic notebook interface) for which you use Mathematica in the first place! At this point just do yourself a favor and learn TensorFlow instead.

  • Don't even think about trying to use remote desktops. Running VNC over AWS (unless you are a tortoise or sloth) would drive you mad with all the lagging super-low frame rates.

Thank you, Marco and Mike, this is extremely useful.

I fully agree that a desirable solution would be for Mathematica to support Apple GPU installations out of the box - one of the reasons that I love Mathematica is that I do not have to install a thousand libraries (or even compile them from scratch) but that things work straight away.

However, I would be quite happy if there was just decent support for cloud-based GPUs. By this, I don't mean support for Wolfram Script (at least a step in the right direction) but support for full interactive Mathematica. I would love for Mathematica online to offer the option to chose a GPU backend. I'd happily pay for it as long as the charges are reasonably (ie comparable to EC2 or Google).

Ultimately, if I had to choose between cloud GPU support and desktop GPU support, I would probably opt for the cloud. After all, I can't run a compute job that takes several days several days on my laptop!

Cloud support would be of primary use for large jobs. However, the GPU is, or could be used for so many 'everyday' operations beyond simple graphics that you wouldn't want to have to call the cloud for each instance. There are more than 50 functions that can use a GPU if an acceptable one is found, and I for one would like to be able to use those out of the box on my Mac. And, as someone pointed out, it should be possible to implement parallel computing on the GPU as well as the various cores of the CPU.

As I pointed out, Apple has already done most of the heavy lifting with their various Meta frameworks, rendering the code hardware independent for both macOS and iOS (iPadOS) for the long term. It is simply a matter of prioritization at Wolfram whether Mathematica will be able to use the technology on these platforms.

I have heard or read several times recently that Wolfram is seeking feedback to help set priorities. The more people who make this a desirable feature, the more likely we will be to seeing it implemented. Unfortunately, my main venue for schmoozing is the WTC, which will be on-line only this year.

Posted 3 months ago

As I mentioned the best option would be for Wolfram to support Metal to accelerate general purpose computing on GPUs (instead of just graphics for the notebook). This way Mathematica/Wolfram Language would become the best option to run GPU-accelerated neural nets on recent Macs. Even Tensorflow doesn't support that so you're stuck using the cloud.

P.S. this is a mandatory step anyway when they eventually have to build an ARM-based Mac version. Same goes for a full iPad app.

P.S. this is a mandatory step anyway when they eventually have to build an ARM-based Mac version. Same goes for a full iPad app.

This is my concern as well. Wolfram was there at the WWDC Keynote when the switch to Intel Macs was announced. (There are videos with Theo and Rob showing how the 'little switch' worked.) Since then, support for new Apple technologies has lagged behind -- they just made the deadline for 64 bit apps with the front end. Although ARM-based Macs are presently just rumors, I think that it is almost certain that this will happen.

I understand the desire to have a common code base for different platforms. In my opinion, leveraging the work that Apple has already done to make code hardwire independent -- Specifically Metal in all its implementations -- outweighs this, and will provide a superior program for Mac users, certainly, and possibly also for Windows users.

One more thing: Wolfram Research, and Stephen personally, has done a lot towards the democratization of computation. That is, WL has made the tools needed to do computation available fro a much wider audience.

Fully supporting Apple hardware should be a critical step in this process. Fully supporting Metal would result in a two or three orders of magnitude increase in processing speed for many operations. While a MacBook Air is always going to be slower than a Mac Pro, it is likely that a MacBook Air with full GPU support would be faster than a Mac Pro without.

The key idea is that anyone with a low end Apple product (including the iPad) could do significant work using Mathematica.

I'm not pretending that Apple is creating the GPU APIs for scientific computing. Games have been the driver for GPU development and use for some time now.

Wolfram has already gone far in making advanced computing and data science accessible to pretty much anyone who can afford a computer (and broadband). Fully supporting Metal (etc.) on these devices would lower the threshold to exploration and use for AI type computations, as well as any other computations that can benefit from the GPU.

I am not an expert on Windows or Linux hardware, but my guess is that NVIDIA GPUs are restricted to high end machines, and the hardware vendors have essentially written off all but the most dedicated people(with deep pockets), so there is no incentive, for example, for the group making the open source tools for GPUs to support anything but a narrow range of hardware.

As I have written previously, Apple has already done the heavy lifting in providing a set of APIs that work across their entire product range, and which will continue to work for the foreseeable future -- including across the rumored switch to ARM. I realize that there may be some issues for a cross-platform program to provide platform-specific code. However, Wolfram Research is already doing that, more so in recent years as the difficulties in maintaining a common code base have increased.

While the benefits of Machine learning have been badly hyped (again), we have seen that in a lot of specialized domains, it is very useful, so the hardware acceleration provided by using the GPU will benefit almost all Mathematica users, not just those specializing in Neural Nets. Being able to do a computation in seconds rather than hours will let a large number of people to try stuff that they might not attempt, otherwise.

Providing full GPU support in WL for the entire Apple range would benefit a far greater number of current and potential users of Mathematica than practically any other software initiative.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract