Thank you Jon for bringing this to our attention.
I was looking for alternative options to run an LLM on my machine to substitute endless subscriptions.
Justine's repository no longer works. Mozilla integrated the llmafiles into their ecosystem see this post: https://hacks.mozilla.org/2023/11/introducing-llamafile/
You can find the llmafiles at this Mozilla's Github page: https://github.com/Mozilla-Ocho/llamafile.
My initial findings of running the llmafile on my machine:
It runs acceptably in a chat browser out-of-the-box on my PC (Intel i7 laptop with NVDIA card, Win 10 64-bit) without additional flags, but requires memory (I suspect you need at least 16 GB of RAM).
Here is a performance (tokens /sec) comparison of running the llmafile in a browser chat with and without the GPU flag:
- CPU only: 2.85;
- With GPU flag: 4.96.
I will play with it in the WL and see what I will find.