Will you include a demo running a local LLM during this course? Is there anything special about accessing an LLM locally, or does that look the same as accessing remotely?
I believe that 2026 will be the first year that local LLM usage becomes widespread. With their A19 Pro and M5 processors, Apple has straightened out a major kink in their architecture. When Apple announced the iPhone 17 models at their September 9 press event, they included a brief discussion of the A19 Pro's new GPU accelerator in the event:
We have been at the forefront of AI acceleration since we first
introduced the neural engine eight years ago. We later brought
machine-learning accelerators to our CPUs. And while our GPU has
always been great at AI compute, we’re now taking a big step forward,
building neural accelerators into each GPU core, delivering up to
three times the peak GPU compute of A18 Pro. This is MacBook Pro
levels of compute in an iPhone, perfect for GPU-intensive AI
workloads.
That was a bit vague. Fortunately, the "Petar Tech" YT channel explained more in the video Demystifying Apple's AMX AI accelerator: when and why it outperforms the GPU. In a nutshell, matrix multiplication -- a cornerstone of LLM training and token generation -- happens atomically in this chip. Historically, the elements of the two matrices being multiplied are fetched from memory multiple times. Petar notes in his video commentary: Because nowadays, the energy needed to transfer data significantly exceeds the energy needed to perform computations on the data. Apparently, NVIDIA has been doing this for several generations on their Tensor Cores; Apple is just catching up on this particular processor feature.
Apple has incorporated the same per-GPU accelerators in its M5 Processors that were released for the low-end MacBook Pro and iPad Pro models this month. In one early test, "Time to First Token" was measured about 3.6x faster than a M4 processor. That baseline M5 processor is limited to 32GB of unified memory, but the "M5 Pro" and "M5 Max" chip computers (both available early 2026) should have up to 128GB of unified memory available. Now that Apple has had time to percolate this change through the years of chip layout and fab, we should be able to run some general-purpose LLMs locally starting soon. All of Apple's M5 processor computers should be far more attractive than earlier processors for any sort of AI development or usage.