I've been testing the usefulness of the new Notebook Assistant with Mathematica 14.2, and I have some thoughts. I've been a user of Mathematica since version 1.1, and have witnessed personally some of the boom and bust cycles of artificial intelligence. In the 1990s, I developed and implemented a highly successful expert system, which, although is primitive compared to current AI efforts, was state-of-the-art at the time.
All of the demos of the notebook assistant are impressive, and it is remarkable that it works as well as it does. Once I got to the hands-on stage, I realized that this is really a 0.6 release, rather than a solid 1.0 release.
When the system works, it is quite impressive. However, the stochastic nature of the underlying LLM and its tendency to 'hallucinate' (I believe that this is the current descriptive term for "making things up") limit the real-world usefulness of the tool.
That is not to say that it is not useful at all. It can suggest tings that an experienced user may employ. However, usefulness is a relative term. I recall that John Cage used imperfections in manuscript paper as a useful source of ideas. This does not mean that paper imperfections are a real compositional technique -- the technique and artistry were in Cage's mind.
As for the Notebook assistant, its tendency to hallucinate limits its utility in operation. I had expected that the code output by the assistant would have been verified for correct syntax before it was presented to the user. After all, WL checks my syntax, and I get instant feedback by all the red and orange blocks that I had a misplaced bracket or comma. It should be easy to verify syntax, at least, and if it fails have the AI try again without showing me the bad code.
When asked, the assistant can verify code, and it is often successful, but not often enough for practical purposes. Even when errors are pointed out, it often cannot fix problems.
In many cases, problems can be resolved interactively, but it would have been significantly faster for me to simply write the code in the first place without the assistant.
It is my belief that these issues are inherent in the design of the LLMs, and no tweaking of the way the LLM works (pre- or post-processing, for example) will resolve the problem.
There is ample literature on the shortcomings of LLMs, starting with the GIGO issue -- the source material being sexist, racist, and of overall questionable quality it is no surprise that the LLMs's output is sexist, racist, and at best mediocre. The basis of for the language model is an extreme formalist approach. This may make some sense in mathematics, where we can provide (it is hoped) an exact definition for each term and operation, but this is patently untrue for natural language, and I submit, for even constructed languages.
I would suggest the book, Hermeneutics : A very short Introduction, by Jens Zimmermann, or the talk I gave at a recent WTC on the topic of hermeneutics.
One thing I noticed in my explorations: several times, after a failure to suggest working code, I would type in a correct solution in my notebook. Once I did this, the assistant would invariably use this solution, even after I had restarted Mathematica. This indicates to me that it is not regenerating responses using the LLM technique, but is 'remembering' in some fashion my code. If this is read (I think it is) it would make testing the assistant more difficult, since it would appear to work better than it really does.
I can see the LLMs are not the only possible model. It will take some research, but I believe that a language model that embraces metaphor and deep context can be constructed, and perhaps will not require the brute-force methods that current models do. In addition, Wolfram already had a natural language processing engine (Wolfram|Alpha, etc.) that while it does not have the range of the assistant, can possibly be expanded to handle the code-generating aspects of the process.
I am intrigued by the promise of the Notebook Assistant.
I would welcome a way to discover all the functionality in the core Wolfram Language, and especially the repositories. (In early versions, each release had a hard cover book, which I read cover to cover. Current versions have so many functions that Stephen himself has stated that he is not aware of all of them.)
I would welcome a second pair of eyes when I am trying to debug code. The current assistant is simply not competent enough to be relied on.
I would welcome an assistant that would handle the boilerplate of taking my code and preparing it for the Function Repository, for example.
I have seen demos where all of these tasks were successfully performed, so I know that these should be achievable goals. For a practicing WL user, the current iteration is more a proof of concept than a reliable tool.
I really want this idea to work. I have a license for a year, and will be evaluating the assistant from time to time.
Bottom line: the current iteration of Notebook Assistant is a pricy toy rather than a practical tool for most WL users.