Group Abstract

Message Boards

1.2K Views

0 Replies

1 Total Like

View groups...

Follow this post

Share this post:

GROUPS:

External Programs and Systems

Posted 11 months ago

This benchmark is great: https://www.wolfram.com/llm-benchmarking-project/ Creating more competition there will push AI companies to focus on Wolfram language. All popular benchmarks have been overfitted which means that they give a biased estimate of general intelligence, but the act of fitting to the benchmark adds to that particular task. I've recently informed a former OpenAI colleague that o1 only gives 2% improvement according to Wolfram LLM benchmark, and he raised this issue internally. This means that value of benchmark as an unbiased proxy for intelligence could go down, but this would force them to make performance on Wolfram coding go up. Recently I've started paying for Claude Pro since it recently became my most commonly used LLM tool. It would be useful to add it to the leaderboard, to encourage healthy competition. In my past life I've worked with Anthropic co-founder, hence I could nudge this issue on my side.

POSTED BY: Yaroslav Bulatov

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback