several points:
I do plan to show NN programming in my data science track, but I don't know yet to what extent. Also reinforcement learning (with and without NNs -- most people don't know this, but reinforcement learning has many extremely useful applications that don't involve NNs at all). My next sessions in the data science track will be about parallelism features, and followed directly by CUDA programming, which in my opinion is the logical extension of parallel kernels and takes parallelism to the next level. That will also lead me to LibraryLink (because the most powerful CUDA applications run from M naturally bring you to LibraryLink), and then I'll have sessions about the other link products (J/Link, NETLink, LibraryLink, RLink, etc.). That means NNs is not on my near-term plan to begin with (I consider parallelism and the link products "infrastructure", and I want to cover infra first). Next, I'll start a parallel track about financial options theory after the next data science session (probably on the 23rd). These will alternate about weekly between a data science session and a financial options theory session. I was discussing a few tracks with the PR group: financial options, differential equations, combinatorial optimization, NNs / ML (or perhaps more general, AI, which would include reinforcement learning). I decided to go with financial options first, because I can bring that to a close in about half a year. The others would take me a year or more (as I don't like to skimp over important things and like to dig deep). Next, WRI has plans to have some of their employees to develop such programs for the interactive classes, I probably shouldn't disclose any details about that (extent, length, people, covered content) -- just letting you know that this is or will be in the works. I want to see that first, one of these programs won't start until the fall. And as my financial options sessions with PR will take me until some time in the summer, I'll make my decision which track to do after my finopt ends, no sooner than this summer. It may or may not be NNs (or more comprehensively AI), I'll decide that then together with the PR group. But with that said, NNs / ML is indispensable for the modern-day data scientist, so I'll have to cover it, but not after several sessions about parallelism, CUDA, and link products are covered first, and not in such detail as I'd like to use if I were to conduct a NN / ML / AI track starting this summer. I can't do my data science track only about AI. Better to make that a separate track.
With that said, though, I strongly recommend getting a local NVidia GPU as well. Simple cards don't cost much, you can get relatively new RTX cards (2060s, 2070s) for less than a kilobuck. Also the 1080/1070/1060 cards get cheaper as these Volta generation cards did not see the breakthrough through crypto mining many GPU miners were hoping for (NVidia has stockpiles of surplus of Volta cards, as you may know, the whole Volta generation was about data centers). I normally wouldn't recommend Volta cards anymore, long superseded by the Turing generation, but Volta cards are now VERY cheap, a few hundred bucks, and that's good enough for a start -- not good enough for real power computing. If you only have a schlepptop without NVidia card, you have run into a snag, I've seen people use external GPUs with some external PCI dock, but I don't know if that's really powerful, and I don't like an external cable-and-power-supply mess. In short, if you only have a schlepptop, get a real box. You get more bang for the buck, or the opposite perspective: you pay less for the same bang.
Yes, you can retrieve a trained network file, computed in the cloud, and then use it on your local machine for inference (sometimes called prediction). But even then you probably want a local GPU. I give you another reason: the GPUs in AWS are getting beefier and beefier all the time. I believe they're even phasing out their older ones, like the K80s, they're just not as powerful as the modern cards, and they're the old Kepler generation, compute capability 3.7. That's pre-pyramids by today's standards. But for the newer generations that means you pay significantly more per hour, some GPU instances are 10 - 20 bux per hour. If you do that a lot, it will cost you more than getting a new workstation with a decent CUDA card. Also, if you "fill" these big cards with heavy jobs, then the resulting networks will probably be so big and complex that you would want a local GPU for inference/prediction. Doing that without a local GPU will not be powerful, you can do that only with small/simple networks. But those you can create on your own, no need to pay for the beefy GPU instances you find on AWS (you pay for time, not problem size, so a massive job will cost you the same as a toy problem on the same instance type). I hope that explains my view on this from both sides.