Message Boards Message Boards

Running a local LLM using llamafile and Wolfram Language

Posted 11 months ago

Attachments:
POSTED BY: Jon McLoone
6 Replies
Posted 7 months ago

This is great news. I would love to "play" with Liama 3 locally using the built-in LLM functions I've already used for performing some of my use cases.

POSTED BY: Jacob Evans
Posted 10 months ago

Interesting that's good to know! To be honest calling the server was interesting as there are many systems that now build API by servers (like ollama for Mac).

I guess I'll have to give my bucks to OpenAI for a while then!!

POSTED BY: Ettore Mariotti

I understand that there is a project to do this. It will also call the library directly rather than via the server as I did here, for better efficiency. I don't know when to expect that to be available though, so be patient for now!

POSTED BY: Jon McLoone
Posted 10 months ago

Hi Jon! This is very interesting.

I was wondering if it's possible to configure a local language model as the default large language model (LLM) that the system uses when dealing with features like LLMFunction, etc.

It would be great if it were possible to leverage all the Wolfram Language technology already developed for LLMs, such as the prompt repository, and so on.

Does anyone have any idea about the feasibility of this?

POSTED BY: Ettore Mariotti
Posted 11 months ago

Thank you Jon for bringing this to our attention.

I was looking for alternative options to run an LLM on my machine to substitute endless subscriptions.

Justine's repository no longer works. Mozilla integrated the llmafiles into their ecosystem see this post: https://hacks.mozilla.org/2023/11/introducing-llamafile/

You can find the llmafiles at this Mozilla's Github page: https://github.com/Mozilla-Ocho/llamafile.

My initial findings of running the llmafile on my machine:

It runs acceptably in a chat browser out-of-the-box on my PC (Intel i7 laptop with NVDIA card, Win 10 64-bit) without additional flags, but requires memory (I suspect you need at least 16 GB of RAM).

Here is a performance (tokens /sec) comparison of running the llmafile in a browser chat with and without the GPU flag:

  • CPU only: 2.85;
  • With GPU flag: 4.96.

I will play with it in the WL and see what I will find.

POSTED BY: Dave Middleton

enter image description here -- you have earned Featured Contributor Badge enter image description here Your exceptional post has been selected for our editorial column Staff Picks http://wolfr.am/StaffPicks and Your Profile is now distinguished by a Featured Contributor Badge and is displayed on the Featured Contributor Board. Thank you!

POSTED BY: EDITORIAL BOARD
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract