100 Math Problems: ChatGPT with Wolfram Plugin versus the Code Interpreter Model

Posted 1 year ago

POSTED BY: Michael Trott
Extending the computational range of ChatGPT-4: optimizing prompts for enhanced performance

POSTED BY: Michael Trott

Dear Michael:

I attempt to ask ChatGPT this query. Run this code to solve Hilbert's thirteenth problem, the septic equation. Please see attached file for your reference.

Given. ax^7+bx^6+cx^5+dx^4+ex^3+fx^2+gx^1+h = 0 and x^2+px+r = 0

Find the values of "p" and "r" in terms of "a, b, c, d, e, f, g, and h."


ClearAll[a, b, c, d, e, f, g, h]; {a, b, c, d, e, f, g, h} = {a, b, c, d, e, 
  f, g, h}; Solve[
 h + 1/2 g (-p + Sqrt[p^2 - 4 r]) + 1/4 f (-p + Sqrt[p^2 - 4 r])^2 + 
    1/8 e (-p + Sqrt[p^2 - 4 r])^3 + 1/16 d (-p + Sqrt[p^2 - 4 r])^4 + 
    1/32 c (-p + Sqrt[p^2 - 4 r])^5 + 1/64 b (-p + Sqrt[p^2 - 4 r])^6 + 
    1/128 a (-p + Sqrt[p^2 - 4 r])^7 == 0 && 
  h + 1/2 g (-p - Sqrt[p^2 - 4 r]) + 1/4 f (-p - Sqrt[p^2 - 4 r])^2 + 
    1/8 e (-p - Sqrt[p^2 - 4 r])^3 + 1/16 d (-p - Sqrt[p^2 - 4 r])^4 + 
    1/32 c (-p - Sqrt[p^2 - 4 r])^5 + 1/64 b (-p - Sqrt[p^2 - 4 r])^6 + 
    1/128 a (-p - Sqrt[p^2 - 4 r])^7 == 0, {p, r}]
POSTED BY: Juan Dela Cruz

Hi Michael,

Just scanning the list, and then, almost randomly, but perhaps subconsciously, clicking the first item to not even really jump out at me, I found my way to intersecting circles.

Intersecting circles is a bit more difficult than intersecting lines, so perhaps the outlook is good for saving time and money on hacker memos by just getting ChatGPT to do it. Not trying to duel or hijack here, just being curious about the future.

In the present work about line intersections, there are issues with canonicalization of vertices, and then DeleteDuplicates broke, leading to a new bug report. Do you have an opinion how much more development needs to be done before ChatGPT can rigorously compute all terms of A187781?

POSTED BY: Brad Klee

I responded to your comment here:

POSTED BY: Michael Trott
Posted 1 year ago

If one writes a code for mathematical evaluation as input, however the output returns no results nor error is ChatGPT with Wolfram Plugin able to
(1) tell the reason behind the output results as per input incorrect syntax
(2) give further correct syntax of the input and correct results of the output ?

POSTED BY: Nomsa Ledwaba

This question raises an important point: the answer is definitively yes.

One of the key strengths of ChatGPT lies in its ability to inspect the results of a call to the plugin and determine the subsequent actions to take. However, it is crucial to acknowledge that as a Large Language Model, ChatGPT's output is not deterministic, meaning that there may be instances where it makes mistakes. Nevertheless, ChatGPT possesses the capability to rectify these errors. To illustrate this, let's examine a concrete example.

Here is a concrete example:

I use this prompt:

"In the following every time I input a task/query/problem, I want you 1) write down the relevant mathematical formulas 2) develop and display well-documented Wolfram Language code that solves the problem and displays the code in a terminal-like way (with proper line length) 2) evaluate the code on the Wolfram Language API (make sure to remember the formatting rules for this API) 3) display the result returned from the Wolfram Language API"

and this task:

"Let's consider Faulhaber's formula (the one with the Bernoulli numbers). Please implement it as a Wolfram Language function and then test it by comparing its result with the straightforwardly computed sum of the first twenty integers cubed."

In most cases ChatGPT will get things right the first time. Here is such a session:

But sometimes ChatGPT makes a mistake (e.g. confusing signed with unsigned Bernoulli numbers). And Then it takes a guess what went wrong. Sometimes even this guess can be wrong. After trying the above prompt and task a dozen times, I got a session with a mistake. And the ChatGPT tried to fix the mistake. First it assumed a numerical precision issue. But it detected that this wasn't the issue. Then it went back and revisited the original formula and then got it right. Here is the session:

In conclusion, the ability of ChatGPT to inspect results, make corrections, and enhance its performance is indeed a significant strength. While it is not infallible and can occasionally make errors, its error correction mechanisms and the capacity to refine code based on test cases contribute to its overall efficacy.

POSTED BY: Michael Trott

Why was the output redacted?

The output of this tool, "Wolfram.getWolframCloudResults" was redacted.

POSTED BY: Peter Burbery

It is the openai plugin structure that 'redacted the output', not the Wolfram plugin and not me personally. I do not know why they do this and I have asked openai about it, but haven't heard back.

POSTED BY: Michael Trott

