Message Boards Message Boards

Notebook Assistant : a review

I've been testing the usefulness of the new Notebook Assistant with Mathematica 14.2, and I have some thoughts. I've been a user of Mathematica since version 1.1, and have witnessed personally some of the boom and bust cycles of artificial intelligence. In the 1990s, I developed and implemented a highly successful expert system, which, although is primitive compared to current AI efforts, was state-of-the-art at the time.

All of the demos of the notebook assistant are impressive, and it is remarkable that it works as well as it does. Once I got to the hands-on stage, I realized that this is really a 0.6 release, rather than a solid 1.0 release.

When the system works, it is quite impressive. However, the stochastic nature of the underlying LLM and its tendency to 'hallucinate' (I believe that this is the current descriptive term for "making things up") limit the real-world usefulness of the tool.

That is not to say that it is not useful at all. It can suggest tings that an experienced user may employ. However, usefulness is a relative term. I recall that John Cage used imperfections in manuscript paper as a useful source of ideas. This does not mean that paper imperfections are a real compositional technique -- the technique and artistry were in Cage's mind.

As for the Notebook assistant, its tendency to hallucinate limits its utility in operation. I had expected that the code output by the assistant would have been verified for correct syntax before it was presented to the user. After all, WL checks my syntax, and I get instant feedback by all the red and orange blocks that I had a misplaced bracket or comma. It should be easy to verify syntax, at least, and if it fails have the AI try again without showing me the bad code.

When asked, the assistant can verify code, and it is often successful, but not often enough for practical purposes. Even when errors are pointed out, it often cannot fix problems.

In many cases, problems can be resolved interactively, but it would have been significantly faster for me to simply write the code in the first place without the assistant.

It is my belief that these issues are inherent in the design of the LLMs, and no tweaking of the way the LLM works (pre- or post-processing, for example) will resolve the problem.

There is ample literature on the shortcomings of LLMs, starting with the GIGO issue -- the source material being sexist, racist, and of overall questionable quality it is no surprise that the LLMs's output is sexist, racist, and at best mediocre. The basis of for the language model is an extreme formalist approach. This may make some sense in mathematics, where we can provide (it is hoped) an exact definition for each term and operation, but this is patently untrue for natural language, and I submit, for even constructed languages.

I would suggest the book, Hermeneutics : A very short Introduction, by Jens Zimmermann, or the talk I gave at a recent WTC on the topic of hermeneutics.

One thing I noticed in my explorations: several times, after a failure to suggest working code, I would type in a correct solution in my notebook. Once I did this, the assistant would invariably use this solution, even after I had restarted Mathematica. This indicates to me that it is not regenerating responses using the LLM technique, but is 'remembering' in some fashion my code. If this is read (I think it is) it would make testing the assistant more difficult, since it would appear to work better than it really does.

I can see the LLMs are not the only possible model. It will take some research, but I believe that a language model that embraces metaphor and deep context can be constructed, and perhaps will not require the brute-force methods that current models do. In addition, Wolfram already had a natural language processing engine (Wolfram|Alpha, etc.) that while it does not have the range of the assistant, can possibly be expanded to handle the code-generating aspects of the process.

I am intrigued by the promise of the Notebook Assistant.

I would welcome a way to discover all the functionality in the core Wolfram Language, and especially the repositories. (In early versions, each release had a hard cover book, which I read cover to cover. Current versions have so many functions that Stephen himself has stated that he is not aware of all of them.)

I would welcome a second pair of eyes when I am trying to debug code. The current assistant is simply not competent enough to be relied on.

I would welcome an assistant that would handle the boilerplate of taking my code and preparing it for the Function Repository, for example.

I have seen demos where all of these tasks were successfully performed, so I know that these should be achievable goals. For a practicing WL user, the current iteration is more a proof of concept than a reliable tool.

I really want this idea to work. I have a license for a year, and will be evaluating the assistant from time to time.

Bottom line: the current iteration of Notebook Assistant is a pricy toy rather than a practical tool for most WL users.

2 Replies

I did get some entertainment value trying to get the notebook assistant to do the right thing, but mostly it highlighted the limitations of the LLM model.

It reminds me of the old Eliza program from the late 1960s. I programmed a very slightly enhanced version of the program in BASIC in the 1970s. Surprisingly, some grad students (in physical biochemistry) were convinced of its intelligence. The simulation was convincing, as long as you stayed within a very narrow domain of discourse. LLMs are immensely more complex and have a much wider domain, but there are limits.

Like any model, it is bad, but it can be useful. The Ptolemaic model of the solar system was quite useful if all you wanted was a prediction of where the planets would be within a certain error. It is entirely useless for navigating from earth to Mars.

LLMs seem to be useful, as long as their tendency to hallucinate does not matter. Being able to write legal code seems to be beyond their domain of use, though.

For myself, I have given the people running the Wolfram Issue Tracker something to work on, but I have other things to do. I look at this release as a massive beta test, and I look forward to people with more patience (and spare time) providing the feedback to turn this into a product worthy of the main product.

As I indicated, on philosophical grounds, I do not think that LLMs will ever be free of the tendency to hallucinate. The field at present seems to be dominated by strict positivists, though, so it will take people with a different view of language to come up with a better model.

George and others. Of course, the so-called hallucinations are problems that must be brought to an absolute minimum. Unlike some people, I have a lot of patience to put up with obvious errors and find a way of getting at the truth. How do I know the difference? I don't always, but fortunately hallucinations don't seem to be based in difficulty! Sometimes, I can ask a competing Ai in a different context. Sometimes ask another for a proof while putting it off as my own naive idea. Sometimes do a little research. Sometimes the hallucination just makes no sense!

As far as "reasoning limitations go," like proving something clearly provable, but I've never seen written out and am too lazy to think of on my own, giving the Notebook assistant hints from my best reasoning as I go and asking for a step or two at a time seems to work well so far.

Here is an example of where we both just had to hold each other's hand to scratch out a proof that was sufficient for me:

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract