@ironicbadger What kind of models are you looking at? any fun use-cases popping up? I've toyed around with ollama, but hasn't come up with any exciting use cases yet. Though I'm wondering if I can tie it in with obsidian for certain summarizing tasks. What I really want is to consume local content and be able to query it, but haven't had time to dig in how to accomplish that.
I've been playing around with locally hosted #LLMs using the #Ollama#CLI tool. I've mostly been using models like mistral and dolphin-coder for assistance with textual ideas and issues. More recently I've been using the llava visual model via some simple #Bash#scripting, looping through images and creating description files. I can then grep those files for key words and note the associated filenames. Powerful stuff!
@my_actual_brain Dual 30xx card are cheap and last time I did dual cards, finicky to keep working. 40xx very expensive, can run small models very fast, large models not at all. Mac studio with maximum memory or MacBook pro with maximum memory can run large models at "medium" perf. IMHO, for llm work quality matters more that just speed. Dumb 7b models have far more limited applications
@my_actual_brain If I was writing a hobby text adventure game (or even a commercial game), I'd use 7b models for the NPC dialog. The worst LLM is a billion times better than a static conversation tree. For asking a bot to write unit tests, find bugs in code, etc- quality is so important that now it doesn't make sense to use any local model when Claude or GPT4 exist. If I have to double check the bots work anyhow, running dumb model is a waste of my reading time.
Has anyone here worked much with generators in #emacs ?
I am looking for a good solution for streaming outputs in my ollama-elisp-sdk project. I think there's a good angle using generators to make a workflow fairly similar to e.g. the OpenAI API. Not sure yet though.
@holgerschurig Thanks! I'm not really going for a repl; there are a few different implementations out there already. I'm more just trying to write an elisp layer over the whole ollama API, so you can more easily use ollama via elisp.
I think the main issue I'm running into is how callbacks work in the url library. I don't yet see how to use url for streaming (and I don't technically have to, I could use curl) because the callback still isn't triggered until the full (streaming) response is complete.
This past month, I was talking about how I spent $528 to buy a machine with enough guts to run more demanding AI models in Ollama. That is good and all but if you are not on that machine (or at least on the same network), it has limited utility. So, how do you use it if you are at a library or a friend’s house? I just discovered Tailscale. You install the Tailscale app on the server and all of your client devices and it creates an encrypted VPN connection between them. Each device on your “tailnet” has 4 addresses you can use to reference it:
Machine name: my-machine
FQDN: my-machine.tailnet.ts.net
IPv4: 100.X.Y.Z
IPv6: fd7a:115c:a1e0::53
If you remember Hamachi from back in the day, it is kind of the spiritual successor to that.
There is no need to poke holes in your firewall or expose your Ollama install to the public internet. There is even a client for iOS, so you can run it on your iPad. I am looking forward to playing around with it some more.
#Ollama is the easiest way to run local #AI I've tried so far. In 5 minutes you can have a chatbot running on a local model. Dozens of models and UIs to choose from.
Just the speed is not great, but what can I expect on an Intel-only laptop.
@rvr I think GPT4all is probably a bit easier because it already comes as a chat app, but Ollama is more flexible. It runs as a service and you can have a CLI, web, desktop, whatever you program against their API client connected to it. The number of models available is also higher and it's easy to switch between them.
Completely forgot I had made this #fountainpen database a while ago when I was bored: https://codeberg.org/bmp/flock, it is written in Go, and was generated with #ollama if I remember correctly. Maybe I'll pick it up again, given that newer models seem to be better.
Back in December, I paid $1,425 to replace my MacBook Pro to make my LLM research at all possible. That had an M1Pro CPU and 32GB of RAM, which (as I said previously) is kind of a bare minimum spec to run a useful local AI. I quickly wished I had enough RAM to run a 70B model, but you can’t upgrade Apple products after the fact and a 70B model needs 64GB of RAM. That led me to start looking for a second-hand Linux desktop that can handle a 70B model.
The Xeon W-2125 has 8 threads and 4 cores, so I think that CPU1-CPU8 are threads. My theory going into this was that the models would go into memory and then the GPU would do all of the processing. The CPU would only be needed to serve the results back to the user. It looks like the full load is going to the CPU. For a moment, I thought that the 8 GB of video RAM was the limitation. That is why I tried running a 7b model for one of the tests. I am still not convinced that Ollama is even trying to use the GPU.
I am using a proprietary Nvidia driver for the GPU but maybe I’m missing something?
I was recently playing around with Stability AI’s Stability Cascade. I might need to run those tests on this machine to see what the result is. It may be an Ollama-specific issue.
Have any questions, comments, or concerns? Please feel free to drop a comment, below. As a blanket warning, all of these posts are personal opinions and do not reflect the views or ethics of my employer. All of this research is being done off-hours and on my own dime.
A long long time ago, @arfy made a lua script for Dolphin screen-readers that allowed you to type in plus or minus number of days and get the date. I just asked Dolphin Mixtral to do the same as an apple script using #Ollama running locally and it actually did it. It runs and works just as I wanted. Madness.
set numDays to text returned of (display dialog "Enter the number of days:" default answer "")
set targetDate to current date
set newDate to targetDate + numDays * days
display dialog "The future date will be: " & (newDate as string)
@FreakyFwoof I think I need to try this and hope it doesn't explode my poor 24GB air. What are you using to run these models? Is there some tool that gives us a worthwhile GUI or are you just doing it in the terminal? I haven't really played with any of this stuff but I feel like I need to start, and unfortunately the Mac is probably my best hope of running AI stuff.