Group Abstract Group Abstract

Message Boards Message Boards

3
|
8K Views
|
45 Replies
|
26 Total Likes
View groups...
Share
Share this post:
GROUPS:

[WSG25] Daily Study Group: Wolfram Language and LLMs: Ideal Complements

Posted 7 months ago
POSTED BY: Arben Kalziqi
45 Replies
POSTED BY: Carl Hahn
Posted 6 months ago

Hi, Carl. Here's what Grok suggested. "Flush caches" was the only thing I saw that you probably haven't already tried. The response is repeated because I prompted it to "think harder". HTH. https://x.com/i/grok?conversation=1990170094241214480

POSTED BY: Phil Earnhardt

So you are not experiencing the same issue?

POSTED BY: Carl Hahn
Posted 6 months ago

I have experienced the same thing on two different computers. I think the service is down and has been for several days.

Now that Wolfram is now a serious "services" company, they need status pages that are updated 24/7 with the status of all key cloud services and repositories.

I've wasted several hours on this, including filing a report with Wolfram.

POSTED BY: Paul Nielan
POSTED BY: Carl Hahn
POSTED BY: Arben Kalziqi

OK, last LLM question of the evening. I promise!

Arben... Is there a way to hook into the (actual raw text) request-response conversation between Mathematica and the LLM service provider? Or, is there a way to get a log of this?

POSTED BY: Stephen Turner
POSTED BY: Arben Kalziqi

Another rant on the subject of LLM models...

Please don't default to using GPT-4o. It's bloody expensive!

The accepted way to calculate a combined input-output price for tool based applications (eg. Mathematica) is (3 * input-tokens + 1 * output-tokens) / 4.

GPT-4o's price per million tokens is (3 * 2.50 + 1 * 10.00) / 4 or $4.375.

GPT 4.1 Mini is (3 * 0.40 + 1 * 1.60) / 4 or $0.70 (yes, less than one-sixth) - and this is just as good as 4o for coding tasks and better at instruction following.

Once Wolfram fixes the bugs in GPT-5 latency (see my other post), GPT-5 Mini (soon GPT-5.1 Mini) is much better and 62% of the cost of GPT-4.1 Mini. I would argue that GPT-5 Nano (at a very tiny fraction of the price of the aforementioned models) will probably do the jobs for a lot of tasks.

https://platform.openai.com/docs/pricing

POSTED BY: Stephen Turner

Minutes after writing the above post, OpenAI released the GPT-5.1 model series APIs.

For Mathematica (or any coding application), the premier OpenAI model is unquestionably GPT-5.1 Codex. It's designed for exceptional instruction following and code generation. While the full model is intended for high-end planning, architecture, design, refactoring, etc, the Mini model excels at coding.

And the Mini version is cheap. Using the above formula, the price per million tokens is (3 * 0.25 + 1 * 2.00) / 4 or $0.6875.

Unfortunately, while it can now be selected in Mathematica Preferences / AI Settings and used with the LLM APIs (Model name "gpt-5.1-codex-mini"), it will immediately fail (see attached image).

It seems Wolfram is still using the now-deprecated OpenAI "completions" API rather that the modern "responses" API.

Crap!

Attachment

Attachments:
POSTED BY: Stephen Turner

Indeed, 4o is more expensive and I wouldn't necessarily recommend it for general use. However, when I use an external model for the sake of examples in these notebooks, I'm more concerned with performance than cost (or performance per unit cost), despite the fact that (say) the mini models do almost as well for a fraction of the price.

This is a relevant consideration for users at home, however, so thank you for pointing it out!

POSTED BY: Arben Kalziqi

Thank you, Arben!

I would also like to point out that this does not just apply to us poor home users.

Any research project which involves repetitive operations on data (imagine sentiment analysis on a million samples) would have its budget blown out of the water by a more expensive model.

Once one has a viable prototype solution, try it with a Nano model. These are dirt cheap in comparison to even the Minis.

GPT-4.1 Nano is presently the one to test against. Once Wolfram has fixed their GPT-5 and GPT-5.1 interface code, these Nano versions are even cheaper.

POSTED BY: Stephen Turner

Re: getting a list of models, here's how you can do that:

In[139]:= services = Keys[(LLMServices`LLMServiceInformation[])[[1]]]

You can then grab the models associated with a given service with:

In[142]:= Wolfram`Chatbook`Common`getServiceModelList["OpenAI"]

And just the model names explicitly with:

#Name & /@ Wolfram`Chatbook`Common`getServiceModelList["OpenAI"]
POSTED BY: Arben Kalziqi

Arben... Thanks for this. Really helpful.

Something I wish to point out. Perhaps this should be a support request but, for LLM stuff, they never know what I'm talking about.

When I do a fresh start of Mathematics (new 14.3 download and clean install with packlet update on Windows 11) and then execute your code, I get the error shown in "getServiceModelList.png". It then gives me the LLM model association. This will only happen once. I have to restart Mathematica or quit the kernel to repeat the error.

The same error occurs the first time I open Preferences / AI Settings in a kernel session, as shown in "AI Settings.png".

Attachment

Attachment

POSTED BY: Stephen Turner
POSTED BY: Stephen Turner

It looks like "Reasoning" is an available setting for LLMEvaluator, but there seems to be a bug in it right now. I believe that it's fixed in our next major release and they're working on backporting it into a paclet update that will work in earlier versions. (And once 5.1 is working, that defaults to setting "Reasoning" to "None" anyway, which is not an option on the gpt-5 models [which default to "Medium"].)

This is my current understanding, at least, but I can't claim to be fully authoritative on this.

POSTED BY: Arben Kalziqi

Further comment: when I mentioned this timing discrepancy to one of our devs, they said that they ran a pure API call to gpt-5 with default settings and the query "Tell me a funny joke no one has ever heard before" took 39 seconds to return a result. If your chat environments and API calls are not taking that long, then they're changing a default setting somewhere that we are not changing.

POSTED BY: Arben Kalziqi

"https://www.perplexity.ai/search/tell-me-a-funny-joke-no-one-ha-cKpP47u6TNageZo465bXCQ#0"

One-second response with GPT-5.1. I tried it multiple times.

This on the first day of the 5.1 API release - and it will only get better.

POSTED BY: Stephen Turner
POSTED BY: Arben Kalziqi
Posted 6 months ago
POSTED BY: Phil Earnhardt

I do wonder about the grading on the benchmark—for some reason I expect that it's not being done through the grader used for EIWL, but I couldn't exactly justify that expectation to you. The short of it is that we have several methods and technologies for judging code correctness, but different ones are used in different places... like any company, there are a lot of teams, a lot of different moving parts, and a lot of historical artifacts.

The DeCSS stuff is interesting; I hadn't heard about it before. I do remember a similar incident with some type of Sony DRM being cracked and people saying "the key exists somewhere in the digits of pi, so how could it be illegal to distribute"? and so on :)

POSTED BY: Arben Kalziqi
POSTED BY: Carl Hahn

Hi Carl—thank you! This is actually more of a social issue than a technical one, which is not a thing I can say too frequently when asked about error messages :).

Basically, Lena—the image in question—has long been used for image processing examples. However, the image is cropped from a Playboy centerfold and thus many people have thought that maybe different images should be used for testing. You can read about the image and its history at this wikipedia link.

A few updates back, Wolfram decided that we would probably remove Lena as a test image some time in the future, so while the image is currently still available we pop up an "error" message that's really more of a warning that any code you write that involves the image might not work in the future as the image will eventually not be accessible though ExampleData["TestImage","Lena"].

I think that this context is probably not baked into the models used for Notebook Assistant, and in trying to fix an "error" that wasn't actually an error it started meandering and getting things wrong. For example, it tried to pull the "Iris" dataset for a machine learning example... but in Wolfram Language, we call that "FisherIris", not just "Iris", so it didn't work. If you run ExampleData["MachineLearning"], you'll see it in there.

POSTED BY: Arben Kalziqi
Posted 6 months ago

Will you include a demo running a local LLM during this course? Is there anything special about accessing an LLM locally, or does that look the same as accessing remotely?

I believe that 2026 will be the first year that local LLM usage becomes widespread. With their A19 Pro and M5 processors, Apple has straightened out a major kink in their architecture. When Apple announced the iPhone 17 models at their September 9 press event, they included a brief discussion of the A19 Pro's new GPU accelerator in the event:

We have been at the forefront of AI acceleration since we first introduced the neural engine eight years ago. We later brought machine-learning accelerators to our CPUs. And while our GPU has always been great at AI compute, we’re now taking a big step forward, building neural accelerators into each GPU core, delivering up to three times the peak GPU compute of A18 Pro. This is MacBook Pro levels of compute in an iPhone, perfect for GPU-intensive AI workloads.

That was a bit vague. Fortunately, the "Petar Tech" YT channel explained more in the video Demystifying Apple's AMX AI accelerator: when and why it outperforms the GPU. In a nutshell, matrix multiplication -- a cornerstone of LLM training and token generation -- happens atomically in this chip. Historically, the elements of the two matrices being multiplied are fetched from memory multiple times. Petar notes in his video commentary: Because nowadays, the energy needed to transfer data significantly exceeds the energy needed to perform computations on the data. Apparently, NVIDIA has been doing this for several generations on their Tensor Cores; Apple is just catching up on this particular processor feature.

Apple has incorporated the same per-GPU accelerators in its M5 Processors that were released for the low-end MacBook Pro and iPad Pro models this month. In one early test, "Time to First Token" was measured about 3.6x faster than a M4 processor. That baseline M5 processor is limited to 32GB of unified memory, but the "M5 Pro" and "M5 Max" chip computers (both available early 2026) should have up to 128GB of unified memory available. Now that Apple has had time to percolate this change through the years of chip layout and fab, we should be able to run some general-purpose LLMs locally starting soon. All of Apple's M5 processor computers should be far more attractive than earlier processors for any sort of AI development or usage.

POSTED BY: Phil Earnhardt

I have registered a OpenAI API key in Preferences / AI Settings / Services. I then select "GPT 4.1 Mini" as the model. My natural language prompt works fine.

But, if I change the model to "GPT-5 Mini", the same prompt fails with...

The service returned the following error message: Unsupported value: 'temperature' does not support 0.7 with this model. Only the default (1) value is supported.

I can fix this problem at Preferences / AI / Settings Services, setting Temperature to 1, and try again. Now it fails with...

The service returned the following error message: Unsupported parameter: 'presence_penalty' is not supported with this model.

What to do? All the GPT-5 models fail in the same way.

I reported this two-months ago, and nothing has been done.

POSTED BY: Stephen Turner

Hey Stephen—could you run the following: ` PacletInstall["Wolfram/Chatbook", UpdatePacletSites -> True] and fully quit and reopen the app and see if it starts to work? It looks like the bug you're reporting was known about in an older version of the Chatbook paclet that powers all of the Chat cell functionality but was fixed a little while back.

(I'm actually going to just recommend everybody run this at tomorrow's session to head off any issues!)

POSTED BY: Arben Kalziqi

I am using Mathematica 14.2 and this issue with GPT-5 still persists. I have tried your fix of running the PacletInstall, then it worked for only one time. The second time I try again it still gives this same error "presence_penalty". I checked and it only occurs with GPT-5, all other GPT models run fine.

POSTED BY: Quan Le Thien

Hi Quan—could you post a screenshot or copy and paste the output here? I can talk to our team and/or submit a bug if I have a bit more information. Thanks!

POSTED BY: Arben Kalziqi

Hey Phil! I also watch WWDC ;). I won't be running anything locally for this series, but maybe in the future. (Though I did just get a new laptop from our IT department earlier this year and won't be due for an upgrade for a while, so I won't be benefitting from the M5 improvements! But I really can't complain.)

As far as whether it's any different to run things locally, functionally it isn't. It's more secure, obviously, and relatedly it's more private, but the actual functioning of the thing is not really particularly different. The time it takes to send and receive information to/from LLMs is really not that big a component in the total runtime because it's generally all just text. The extra time taken by network communication may in fact be canceled out by the fact that large companies are already running their setups on specialized hardware, but even if this isn't quite correct it's probably close enough as makes no difference. (It's possible that there are delays based on queuing of their massive numbers of users, though!)

POSTED BY: Arben Kalziqi
Posted 6 months ago

I saw an article today on Apple allowing linking of multiple Mac Studios via Thunderbolt 5 in the next version of macOS.

https://www.engadget.com/ai/you-can-turn-a-cluster-of-macs-into-an-ai-supercomputer-in-macos-tahoe-262-191500778.html

POSTED BY: Paul Nielan
Posted 6 months ago

"Neural accelerators" -- optimally efficient matrix multiplication -- have just been added to the GPUs in Apple's M5 processors. The M5 Pro, M5 Max, and M5 Ultra chips should be mightily impressive for high-octane AI researchers and users. At least machines with M5 Pro and M5 Max should be available in the first part of 2026. It sounds like Apple will suggest multiple Mac Studio boxes with a Thunderbolt 5 "bus" for its highest-end customers to distribute computational tasks to multiple machines. The Wolfram Language core numeric computation engine could be seriously enhanced over the next software update or two.

A question for @Arben: it seems as if the Wolfram AI Course Assistant would be ideal for any questions in a Wolfram course -- including double-checking answers to Wolfram quiz questions. As a member of the faculty, are you encouraged or discouraged by the way students could use these marvelous tools?

POSTED BY: Phil Earnhardt
POSTED BY: Arben Kalziqi
Posted 6 months ago

Dark mode and chat cells and forward slash do enter image description here have some problems with readability

POSTED BY: Paul Nielan

Thanks for letting me know—I'll get this filed with the appropriate team, in case they're not already aware.

POSTED BY: Arben Kalziqi

This is being looked into :)

POSTED BY: Arben Kalziqi

Actually, Paul, could you run this and report back? PacletInstall["Wolfram/Chatbook"] We can't reproduce it in 14.3, or pre-release versions, so it might be that your paclet is out of date and hasn't auto-updated yet.

POSTED BY: Arben Kalziqi
POSTED BY: George Wolfe

I fixed this problem

POSTED BY: George Wolfe

Phew, glad to hear it!

POSTED BY: Arben Kalziqi
Posted 6 months ago

In Day 1 of the class (Monday), the use of the forward slash key within a chat is discussed. It's purpose is to enter Wolfram Language in the chat. On my Mac, running 15.7.2 and Wolfram 14.3, When I push the forward slash key (above return), it is not entered. And so nothing happens. Any ideas? I have TextExpander turned off. I can enter forward slash in other applications.

POSTED BY: Paul Nielan

You know, I was flustered this morning by the account issue and called that forward slash when it's explicitly backslash :). But—as you mention—it is the key above Enter/Return on most keyboards.

My only thought as to why it might not work is an obscure/arcane stylesheet setting. If you open a fresh notebook with cmd+N, immediately open a Chat cell, and type \ in there, does it work? If not, you may want to contact support@wolfram.com because I'm not sure why that would be...

POSTED BY: Arben Kalziqi
Posted 6 months ago

So it does work in a new notebook. Didn't even restart Wolfram Mathematica. But wont' work in exiting notebook that I just created yesterday. Very strange.

POSTED BY: Paul Nielan
Posted 6 months ago

So think I found problem. If you have dynamic content in the notebook and don't "approve it" in the banner showing at top of the window, then the forward slash does not work in a chat cell.

POSTED BY: Paul Nielan

Ah, yes! Many interactive elements in notebooks subtly depend on dynamic updating's being enabled.

POSTED BY: Arben Kalziqi
Posted 6 months ago

Working at November 10, 2025 at 090351_AM.

POSTED BY: Paul Nielan
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard