I agree that more should be done with NLP and code completion, as this is a path towards widespread end-user programmability and language adoption.
Programming languages are fundamentally about representing an abstract computer that developers know how to talk to and which is ideally easier to talk to than the underlying hardware. Language models have demonstrated an ability to... well... model language, and thereby provide code that appears correct in the context of an input prompt. I think we are still far away from a system that can generate programs that solve real-world problems given natural language inputs from inexperienced end users (and honestly, Wolfram Alpha is probably closer to that than CoPilot for many problems). Even the results from co-pilot are only accepted about 25% of the time according to GitHub's CoPilot FAQ - and I would bet over 95% of accepted results required further modification from an experienced developer to actually solve the problem at hand.
I believe the value of tools like CoPilot or Wolfram|Alpha-mode notebooks rests largely in their ability to teach elements of syntax and style of a new programming language to unfamiliar developers. This will likely be quite similar to how artists use reference images or developers in the distant, pre-AI past used Stack Overflow, tutorials, and documentation examples.
Beyond that, the difficult part of programming is not typing the syntax. I think it is instead figuring out what problem you want to solve and how to integrate that solution into the larger system in a coherent way. I believe doing this will always take a lot of effort, though I am hopeful that machines might eventually do some more heavy lifting. I think better starting points generated by language models would have a chance to improve developer productivity, just as an artist starting by tracing an image will be able to finish much faster than one starting from a blank canvas. However, I do worry that over-reliance on that starting point will result in an increase in needless complexity (and associated errors) and a stagnation in style.
Anyway, I think Wolfram|Alpha notebooks could use some serious machine learning NLP upgrades. Co-Pilot-style code generation and output filtering could pave the way to greatly increasing the number of inputs that can be turned into valid code. Also, while Wolfram Language does not have the largest dataset around, it certainly has one of the highest-quality and cleanest datasets: look at the documentation and Mathematica Stack Exchange answers!