Adding Claude to Wolfram LLM leaderboard?

Posted 5 days ago

This benchmark is great:

Creating more competition there will push AI companies to focus on Wolfram language. All popular benchmarks have been overfitted which means that they give a biased estimate of general intelligence, but the act of fitting to the benchmark adds to that particular task.

I've recently informed a former OpenAI colleague that o1 only gives 2% improvement according to Wolfram LLM benchmark, and he raised this issue internally. This means that value of benchmark as an unbiased proxy for intelligence could go down, but this would force them to make performance on Wolfram coding go up.

Recently I've started paying for Claude Pro since it recently became my most commonly used LLM tool. It would be useful to add it to the leaderboard, to encourage healthy competition.

In my past life I've worked with Anthropic co-founder, hence I could nudge this issue on my side.

POSTED BY: Yaroslav Bulatov
