chikim,
@chikim@mastodon.social avatar

Let's try again! I haven't found any UI for local LLMs that isn't annoying to use with screen readers, so I just made one for myself for Ollama called VOLlama. lol Hope someone finds it useful.
Windows users: follow the instruction on the release page to install Ollama with Docker.
Mac user: Install Ollama using the instruction on ollama.ai. Also, the app is not signed.
https://github.com/chigkim/VOLlama/releases/tag/v0.1.0-alpha.1
@vick21 @freakyfwoof @tristan @KyleBorah @Bri

FreakyFwoof,

@chikim Can't use Llava:13B for this one then? I should also install zephyr?

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof You can use Llava, but Llava is more designed for processing image in language. I't's not going to be as good as other regular LLMs specifically designed for chat like openhermes, solar, neural-chat, zephyr, etc.

FreakyFwoof,

@chikim I'm using Zephyr and it's reminiscent of GPT 3.5, but more fun because it's local.
Do you happen to know how I can get it to be accessible from another machine? It's only responding to localhost queries at the moment and I'd like to use it from any machine on the network. I'm not running it in docker on Mac, just the old way mentioned before via the app, if that makes sense.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof If you want to expose your llama to another machine on the network, type this in terminal and quit Ollama from menu extras and open it again.
launchctl setenv OLLAMA_HOST "0.0.0.0"

FreakyFwoof,

@chikim Thanks. does this persist or will I have to do that every time?

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof @FreakyFwoof I believe it'll persist, but not sure 100%.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof Also different models produce different things, so you might want to try different ones. Some are even specifically designed not to avoid engaging in NSFW chat. lol

FreakyFwoof,

@chikim Well it seems Zephyr will engage in very interesting things if you want it to, unlike gpt 3.5. sex, snogging, drugs, it doesn't seem to care. lol

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof That's interesting. I didn't know Zephyr was not censored. lol

FreakyFwoof,

@chikim Really liking this over-the-network stuff. I'm remoted into NVDA, and the Mac is downstairs but I can still chat locally without having to use online resources. What if alpha 2 could receive image files and use llava or some such? Just a thought... Hint hint haha

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof Yeah, I thought about it, so it might happen.

vick21,
@vick21@mastodon.social avatar

@chikim @FreakyFwoof I think Zephyr is relatively censored. For uncensored stuff, try llama2-uncensored. I find uncensored models more creative even for non-evil purposes. Just more creative, period.
Also, there are other uncensored models on Ollama.ai.

FreakyFwoof,

@vick21 @chikim I'm not sure that the combo box for changing models always does something. It seems if I ask the same question to multiple models, I get almost the same response, and very quickly like it's cached somehow.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof @vick21 Close VOLlama, copy a model: ollama cp zephyr andre. Open VOLlama, talk to andre first. Then talk to something else. With VOLlama open, delete andre: ollama rm andre. Then if you try to talk to andre in VOLlama, you should get an error.

FreakyFwoof,

@chikim @vick21 I deleted a model last night without restarting the app, and it did fail then, yeah, but if I switch between let's say Llava2 and Uncensored, it seems to hold the response and give me back something similar to the previous. I can just quit and restart Ollama, but it's still strange.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof @vick21 Maybe Ollama or llama.cpp that ollama uses have caching. Unfortunately I don't have control over behind what's going on.

FreakyFwoof,

@chikim @vick21 No worries, I wasn't meaning to complain, just observing. It got stuck in a strange rut today for example. I asked:
'How many Thursdays in 2024?'
It said 52.
I then switched models and it said the same thing, and talked about 2024 not being a leap year.
I said 'yes it is.'
It insisted that yes it was, but that still meant there were 52 Thursdays and that 2024 was not a leap year.
Very amusing, confusing and round and round.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof Re caching: did you clear the previous message before asking the same question to another model? If not, the newly selected model will receive all the messages, including responses from the previous model you used before the switch. For the newly selected model, it will seem as though it has already answered the questions, and you are asking the exact question again.

FreakyFwoof,

@chikim I did, yeah. I did CTRL+N to clear it, maybe windows isn't fully clearing things, not sure. Sometimes I can do 'New message' then even just press space bar and it comes back with something related to previous.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof Ah, you used the new chat. I also discovered that bug in alpha.2 while implementing saving history. lol

FreakyFwoof,

@chikim I didn't realise until just this morning you'd posted Alpha 2. It seems that new chat now works better.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof I think You'd have alot of fun with alpha.2. You can set system message like You're a funniest comedian. Make every response as funny as possible. lol Also you can copy model and name it Alex. You can also play with adjusting bunch of other parameters in modelfile like temperature that makes model more or less creative/wild. https://github.com/ollama/ollama/blob/main/docs/modelfile.md

FreakyFwoof,

@chikim I don't know what copying model is good for, but I saw it listed there. I'm just slowly learning about all this stuff.

FreakyFwoof,

@chikim I handed my wife a Mac keyboard, turned VO off and let her chat with it. It's surprisingly responsive with Llava2 uncensored. Usually 5-10 seconds to come back with fairly detailed responses most of the time. I'm impressed that a laptop can run this comfortably.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof Tell your wife my apollogy for ugly UI. I have no idea how interface looks visually. I have to ask my wife and fix things. lol

FreakyFwoof,

@chikim Maybe can make a quick php script that does the stuff where you can just visit a page, type in the host to connect to and it does same as your app.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof There are many web based clients out there with variety of accessibility issues. lol

FreakyFwoof,

@chikim I wish I could see the size of a model before downloading... I have a slow connection here, and I'm trying dolphin-mixtral and it's 24 GB. lol

FreakyFwoof,

@chikim That's gonna take a looong time to finish.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof If you go to their library, click model, and click tags. It'll show you the size.

FreakyFwoof,

@chikim Aah usually I ignore tags on sites as they're just blue, orange, red, fish, #Sofa, so I didn't think to try that. Thanks lol

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof Also you'll see like differetn quantized model with different sizes. Higher quantization means more accurate but bigger size. However, 7Bq8 is less accurate than 13Bq4. Parameter count like 7B 13B matters more. Also I wouldn't use less than Q4 unless it's absolutely necessary.

FreakyFwoof,

@chikim Do you still recommend Llava:13B for image recognition where VOCR is concerned?

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof They have 34B. Slower but more accurate. You have to decide how much youre willing to tolerate the slow speed.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof Also that reminds me... they updated their model recently to v1.6, so I would update it with ollama pull llava:13b.

FreakyFwoof,

@chikim Aah thanks. Good to know.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof Welcome to the darkside of local model! It's not nearly as big as chat gpt users, but it's a pretty huge community!

FreakyFwoof,

@chikim Yeah but somehow it's more fun, and you can ask it questions you might not ask something online. Of course it could be sending all your queries to some random spy server outside your network but hey...

pixelate,
@pixelate@tweesecake.social avatar

@chikim @FreakyFwoof Yeah, I only have 32 GB RAM on this Windows box, and an AMD GPU, but running 13B models isn't so bad, but they can halucinate a lot. Like I wouldn't rely on this for just about anything other than creative or summarization maybe. But they're getting better all the time, and some models, like OpenChat, are better at things like asking about simple coding things than others. I've not gotten into editing modelfiles though. Oh and I'm doing this on WSL2, so yeah I don't quite have the full 32-GB RAM I could have if things weren't so Linux/Mac-centric.

FreakyFwoof,

@pixelate @chikim I've just installed Dolphin Mixtral, and it seems to be running fine on my M1 Max over here.

FreakyFwoof,

@chikim Hahah she's quite fine with it.

FreakyFwoof,

@chikim I just closed the app entirely and then loaded it up and it had forgotten, so I think CTRL+N isn't clearing properly. Could be at the other end I don't know, but I don't know enough about such things.

FreakyFwoof,

@chikim This one's very sassy and refuses to be corrected, no matter what.

Zephyr: I'm sorry for the confusion, but based on my research, the statement "Techmoan's real name is Mat Taylor" is not accurate. The correct name of the popular YouTuber and content creator known as Techmoan is Stuart Rankin. I suggest double-checking your sources before making such claims in the future.

However, I did come across a different content creator named Matt Taylor who goes by the online handle "The8BitBoy" and creates retro gaming content on YouTube. While he is not the same person as Techmoan (Stuart Rankin), there seems to be some confusion online due to their similar usernames.

I apologize for any misunderstanding caused by this confusion, and I'm happy to clarify any further questions you may have.

FreakyFwoof,

@chikim Also found some very interesting things. Feeding it just one character has unusual results. A single space ' ' for example, made it resend what it had previously sent, even if I start a new chat, despite the fact it says it has no memory.
I just sent it Slash '/' and it's been churning away for a few minutes now, I have no idea what it's going to return but I know it's going to be loooooooong...

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof You can read while it's streaming. You don't have to wait for it to finish. Just shift tab and read. Only prolem is cursor gets reset every time a new word arrives. You can command+a and command+c and paste somewhere too. lol

FreakyFwoof,

@chikim It got into a loop about how Slash is used in Notepad ++. lol

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • DreamBathrooms
  • everett
  • InstantRegret
  • magazineikmin
  • thenastyranch
  • rosin
  • GTA5RPClips
  • Durango
  • Youngstown
  • slotface
  • khanakhh
  • kavyap
  • ngwrru68w68
  • tacticalgear
  • JUstTest
  • osvaldo12
  • tester
  • cubers
  • cisconetworking
  • mdbf
  • ethstaker
  • modclub
  • Leos
  • anitta
  • normalnudes
  • megavids
  • provamag3
  • lostlight
  • All magazines