Let's try again! I haven't found any UI for local LLMs that isn't annoying to... - Random

chikim, 4 months ago

Let's try again! I haven't found any UI for local LLMs that isn't annoying to use with screen readers, so I just made one for myself for Ollama called VOLlama. lol Hope someone finds it useful.
Windows users: follow the instruction on the release page to install Ollama with Docker.
Mac user: Install Ollama using the instruction on ollama.ai. Also, the app is not signed.
https://github.com/chigkim/VOLlama/releases/tag/v0.1.0-alpha.1
@vick21 @freakyfwoof @tristan @KyleBorah @Bri

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ FreakyFwoof, datajake1999, Binder, pitermach

Image

Image alternative text

FreakyFwoof, 4 months ago

@chikim Can't use Llava:13B for this one then? I should also install zephyr?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof You can use Llava, but Llava is more designed for processing image in language. I't's not going to be as good as other regular LLMs specifically designed for chat like openhermes, solar, neural-chat, zephyr, etc.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ FreakyFwoof

FreakyFwoof, 4 months ago

@chikim I'm using Zephyr and it's reminiscent of GPT 3.5, but more fun because it's local.
Do you happen to know how I can get it to be accessible from another machine? It's only responding to localhost queries at the moment and I'd like to use it from any machine on the network. I'm not running it in docker on Mac, just the old way mentioned before via the app, if that makes sense.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof If you want to expose your llama to another machine on the network, type this in terminal and quit Ollama from menu extras and open it again.
launchctl setenv OLLAMA_HOST "0.0.0.0"

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@chikim Thanks. does this persist or will I have to do that every time?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof @FreakyFwoof I believe it'll persist, but not sure 100%.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof Also different models produce different things, so you might want to try different ones. Some are even specifically designed not to avoid engaging in NSFW chat. lol

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@chikim Well it seems Zephyr will engage in very interesting things if you want it to, unlike gpt 3.5. sex, snogging, drugs, it doesn't seem to care. lol

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof That's interesting. I didn't know Zephyr was not censored. lol

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@chikim Really liking this over-the-network stuff. I'm remoted into NVDA, and the Mac is downstairs but I can still chat locally without having to use online resources. What if alpha 2 could receive image files and use llava or some such? Just a thought... Hint hint haha

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof Yeah, I thought about it, so it might happen.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

vick21, 4 months ago

@chikim @FreakyFwoof I think Zephyr is relatively censored. For uncensored stuff, try llama2-uncensored. I find uncensored models more creative even for non-evil purposes. Just more creative, period.
Also, there are other uncensored models on Ollama.ai.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@vick21 @chikim I'm not sure that the combo box for changing models always does something. It seems if I ask the same question to multiple models, I get almost the same response, and very quickly like it's cached somehow.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof @vick21 Close VOLlama, copy a model: ollama cp zephyr andre. Open VOLlama, talk to andre first. Then talk to something else. With VOLlama open, delete andre: ollama rm andre. Then if you try to talk to andre in VOLlama, you should get an error.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@chikim @vick21 I deleted a model last night without restarting the app, and it did fail then, yeah, but if I switch between let's say Llava2 and Uncensored, it seems to hold the response and give me back something similar to the previous. I can just quit and restart Ollama, but it's still strange.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof @vick21 Maybe Ollama or llama.cpp that ollama uses have caching. Unfortunately I don't have control over behind what's going on.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@chikim @vick21 No worries, I wasn't meaning to complain, just observing. It got stuck in a strange rut today for example. I asked:
'How many Thursdays in 2024?'
It said 52.
I then switched models and it said the same thing, and talked about 2024 not being a leap year.
I said 'yes it is.'
It insisted that yes it was, but that still meant there were 52 Thursdays and that 2024 was not a leap year.
Very amusing, confusing and round and round.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof Re caching: did you clear the previous message before asking the same question to another model? If not, the newly selected model will receive all the messages, including responses from the previous model you used before the switch. For the newly selected model, it will seem as though it has already answered the questions, and you are asking the exact question again.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@chikim I did, yeah. I did CTRL+N to clear it, maybe windows isn't fully clearing things, not sure. Sometimes I can do 'New message' then even just press space bar and it comes back with something related to previous.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof Ah, you used the new chat. I also discovered that bug in alpha.2 while implementing saving history. lol

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@chikim I didn't realise until just this morning you'd posted Alpha 2. It seems that new chat now works better.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof I think You'd have alot of fun with alpha.2. You can set system message like You're a funniest comedian. Make every response as funny as possible. lol Also you can copy model and name it Alex. You can also play with adjusting bunch of other parameters in modelfile like temperature that makes model more or less creative/wild. https://github.com/ollama/ollama/blob/main/docs/modelfile.md

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@chikim I don't know what copying model is good for, but I saw it listed there. I'm just slowly learning about all this stuff.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@chikim I handed my wife a Mac keyboard, turned VO off and let her chat with it. It's surprisingly responsive with Llava2 uncensored. Usually 5-10 seconds to come back with fairly detailed responses most of the time. I'm impressed that a laptop can run this comfortably.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof Tell your wife my apollogy for ugly UI. I have no idea how interface looks visually. I have to ask my wife and fix things. lol

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@chikim Maybe can make a quick php script that does the stuff where you can just visit a page, type in the host to connect to and it does same as your app.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof There are many web based clients out there with variety of accessibility issues. lol

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@chikim I wish I could see the size of a model before downloading... I have a slow connection here, and I'm trying dolphin-mixtral and it's 24 GB. lol

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@chikim That's gonna take a looong time to finish.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof If you go to their library, click model, and click tags. It'll show you the size.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@chikim Aah usually I ignore tags on sites as they're just blue, orange, red, fish, #Sofa, so I didn't think to try that. Thanks lol

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof Also you'll see like differetn quantized model with different sizes. Higher quantization means more accurate but bigger size. However, 7Bq8 is less accurate than 13Bq4. Parameter count like 7B 13B matters more. Also I wouldn't use less than Q4 unless it's absolutely necessary.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@chikim Do you still recommend Llava:13B for image recognition where VOCR is concerned?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof They have 34B. Slower but more accurate. You have to decide how much youre willing to tolerate the slow speed.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof Also that reminds me... they updated their model recently to v1.6, so I would update it with ollama pull llava:13b.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@chikim Aah thanks. Good to know.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof Welcome to the darkside of local model! It's not nearly as big as chat gpt users, but it's a pretty huge community!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@chikim Yeah but somehow it's more fun, and you can ask it questions you might not ask something online. Of course it could be sending all your queries to some random spy server outside your network but hey...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pixelate, 4 months ago

@chikim @FreakyFwoof Yeah, I only have 32 GB RAM on this Windows box, and an AMD GPU, but running 13B models isn't so bad, but they can halucinate a lot. Like I wouldn't rely on this for just about anything other than creative or summarization maybe. But they're getting better all the time, and some models, like OpenChat, are better at things like asking about simple coding things than others. I've not gotten into editing modelfiles though. Oh and I'm doing this on WSL2, so yeah I don't quite have the full 32-GB RAM I could have if things weren't so Linux/Mac-centric.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@pixelate @chikim I've just installed Dolphin Mixtral, and it seems to be running fine on my M1 Max over here.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@chikim Hahah she's quite fine with it.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@chikim I just closed the app entirely and then loaded it up and it had forgotten, so I think CTRL+N isn't clearing properly. Could be at the other end I don't know, but I don't know enough about such things.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@chikim This one's very sassy and refuses to be corrected, no matter what.

Zephyr: I'm sorry for the confusion, but based on my research, the statement "Techmoan's real name is Mat Taylor" is not accurate. The correct name of the popular YouTuber and content creator known as Techmoan is Stuart Rankin. I suggest double-checking your sources before making such claims in the future.

However, I did come across a different content creator named Matt Taylor who goes by the online handle "The8BitBoy" and creates retro gaming content on YouTube. While he is not the same person as Techmoan (Stuart Rankin), there seems to be some confusion online due to their similar usernames.

I apologize for any misunderstanding caused by this confusion, and I'm happy to clarify any further questions you may have.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@chikim Also found some very interesting things. Feeding it just one character has unusual results. A single space ' ' for example, made it resend what it had previously sent, even if I start a new chat, despite the fact it says it has no memory.
I just sent it Slash '/' and it's been churning away for a few minutes now, I have no idea what it's going to return but I know it's going to be loooooooong...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof You can read while it's streaming. You don't have to wait for it to finish. Just shift tab and read. Only prolem is cursor gets reset every time a new word arrives. You can command+a and command+c and paste somewhere too. lol

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago

@chikim It got into a loop about how Slash is used in Notepad ++. lol

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment