#Ollama - Posts - kbin.social

ironicbadger, 2 months ago to selfhosted

I hooked the Enchanted LLM app up to my #selfhosted #ollama instance tonight. Running on the epyc box with an nvidia a4000.

I can’t notice a difference in speed between this and the real chat-gpt tbh. And I own the whole chain locally. Man this is cool!!

I even shared the ollama api endpoint with a buddy over Tailscale and now they’re whipping the llamas 🦙 api as well. Super fun times.

https://apps.apple.com/us/app/enchanted-llm/id6474268307

reply

expand (8)

collapse (8)

report

activity

copy /kbin url

copy original url

open original url

Loading...

codemichael, 2 months ago

@ironicbadger What kind of models are you looking at? any fun use-cases popping up? I've toyed around with ollama, but hasn't come up with any exciting use cases yet. Though I'm wondering if I can tie it in with obsidian for certain summarizing tasks. What I really want is to consume local content and be able to query it, but haven't had time to dig in how to accomplish that.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mako, 2 months ago

@ironicbadger would make an awesome video tutorial. ;)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ ironicbadger

DavidMarzalC, 2 months ago to Symfony Spanish

Encaramos el final de marzo con otro nuevo episodio de "Accesibilidad con Tecnologías libres", #ATL para los amigues.

En https://accesibilidadtl.gitlab.io/05 tenéis las notas del programa con los temas tratados y los enlaces.

Si seguís a este alias @6706483 , tendréis automáticamente las publicaciones del podcast dentro del Fediverso

Y por si aún no lo tenéis el feed RSS para vuestras aplicaciones es:
https://accesibilidadtl.gitlab.io/feed

En este episodio 05 participan entre otras personas:

@raivenra
-@jonathanchacon

@DavidMarzalC

Esperamos que os resulte interesante.

#Accesibilidad #LLaVA #oLLama #Joomla #PrestaShop #RealidadVirtual #XFCE #VotoElectronico #AccesibilidadTL

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ blaise

kohelet, 2 months ago to llm

So,
I downloaded Ollama,
installed a local LLM,
installed the Continue VSCode extension, configured it with my local LLM.

Now I just need something to do with it!
Like, any project at all.
huh.

#ollama #llm #continue

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

5am, 2 months ago to LLMs

I've been playing around with locally hosted #LLMs using the #Ollama #CLI tool. I've mostly been using models like mistral and dolphin-coder for assistance with textual ideas and issues. More recently I've been using the llava visual model via some simple #Bash #scripting, looping through images and creating description files. I can then grep those files for key words and note the associated filenames. Powerful stuff!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

orhun, 2 months ago to rust

Here is how you can use ChatGPT in your terminal - with an interface! 🔥

🦾 tenere: TUI for LLMs written in Rust.

🚀 Supports ChatGPT, llama.cpp & ollama

🦀 Built with @ratatui_rs

⭐ GitHub: https://github.com/pythops/tenere

#rustlang #tui #ratatui #chatgpt #terminal #llm #ollama #llama

video/mp4

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ bornach, happyborg

falken, 2 months ago

@orhun @ratatui_rs right... But you can't trust a word it says. So this is better than a proper reference how?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ironicbadger, 2 months ago to NixOS

A fully self-hosted, totally offline, and local AI using #nixos #tailscale and open-webui as a front-end for #ollama

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

ironicbadger, 2 months ago

@SNThrailkill what does a Nextcloud integration get me? This sounds very intriguing

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

SNThrailkill, 2 months ago

@ironicbadger not a lot. If you install Nextcloud Assistant (https://apps.nextcloud.com/apps/assistant) and the OpenAI integration (https://apps.nextcloud.com/apps/integration_openai) you can point it at your ollama instance and get some basic things.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

my_actual_brain, 3 months ago to llm

I’m surprised at the performance an #llm can run at on my 8700-cpu.

It’s a bit slow and I don’t think it’s worth getting a gpu just to make it run at a faster speed, but maybe in time I’ll reconsider it.

If I were going to get a #gpu, what would you recommend?

#gpt #ollama #llama2

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

mistersql, 2 months ago

@my_actual_brain Dual 30xx card are cheap and last time I did dual cards, finicky to keep working. 40xx very expensive, can run small models very fast, large models not at all. Mac studio with maximum memory or MacBook pro with maximum memory can run large models at "medium" perf. IMHO, for llm work quality matters more that just speed. Dumb 7b models have far more limited applications

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mistersql, 2 months ago

@my_actual_brain If I was writing a hobby text adventure game (or even a commercial game), I'd use 7b models for the NPC dialog. The worst LLM is a billion times better than a static conversation tree. For asking a bot to write unit tests, find bugs in code, etc- quality is so important that now it doesn't make sense to use any local model when Claude or GPT4 exist. If I have to double check the bots work anyhow, running dumb model is a waste of my reading time.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ angelus_04

dliden, 3 months ago to emacs

Has anyone here worked much with generators in #emacs ?

I am looking for a good solution for streaming outputs in my ollama-elisp-sdk project. I think there's a good angle using generators to make a workflow fairly similar to e.g. the OpenAI API. Not sure yet though.

#elisp #ollama #ai #emacs

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

holgerschurig, 3 months ago

@dliden For streaming you should probably do something based on the the COMINT mode.

That would give you a ollama repl.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

dliden, 3 months ago

@holgerschurig Thanks! I'm not really going for a repl; there are a few different implementations out there already. I'm more just trying to write an elisp layer over the whole ollama API, so you can more easily use ollama via elisp.

I think the main issue I'm running into is how callbacks work in the url library. I don't yet see how to use url for streaming (and I don't technically have to, I could use curl) because the callback still isn't triggered until the full (streaming) response is complete.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

joe, 3 months ago to random

This past month, I was talking about how I spent $528 to buy a machine with enough guts to run more demanding AI models in Ollama. That is good and all but if you are not on that machine (or at least on the same network), it has limited utility. So, how do you use it if you are at a library or a friend’s house? I just discovered Tailscale. You install the Tailscale app on the server and all of your client devices and it creates an encrypted VPN connection between them. Each device on your “tailnet” has 4 addresses you can use to reference it:

Machine name: my-machine

FQDN: my-machine.tailnet.ts.net

IPv4: 100.X.Y.Z

IPv6: fd7a:115c:a1e0::53

If you remember Hamachi from back in the day, it is kind of the spiritual successor to that.

https://i0.wp.com/jws.news/wp-content/uploads/2024/03/Screenshot-2024-03-04-at-2.37.06%E2%80%AFPM.png?resize=1024%2C592&ssl=1

There is no need to poke holes in your firewall or expose your Ollama install to the public internet. There is even a client for iOS, so you can run it on your iPad. I am looking forward to playing around with it some more.

https://jws.news/2024/joe-discovered-tailscale/

#Hamachi #Networking #Ollama #Tailscale

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ all

sesivany, 3 months ago (edited 3 months ago) to ai

#Ollama is the easiest way to run local #AI I've tried so far. In 5 minutes you can have a chatbot running on a local model. Dozens of models and UIs to choose from.
Just the speed is not great, but what can I expect on an Intel-only laptop.

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

sesivany, 3 months ago

@rvr I think GPT4all is probably a bit easier because it already comes as a chat app, but Ollama is more flexible. It runs as a service and you can have a CLI, web, desktop, whatever you program against their API client connected to it. The number of models available is also higher and it's easy to switch between them.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sesivany, 3 months ago

@ondrejsevcik I run it in an unpriviledged container on Linux. No admin rights required.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bmp, 3 months ago to random

Completely forgot I had made this #fountainpen database a while ago when I was bored: https://codeberg.org/bmp/flock, it is written in Go, and was generated with #ollama if I remember correctly. Maybe I'll pick it up again, given that newer models seem to be better.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ penfount

davidbisset, 3 months ago to apple

NotesOllama uses #Ollama to talk to local LLMs in #Apple Notes on #macos Inspired by Obsidian Ollama. #AI

https://smallest.app/notesollama/

Not sure if i trust this if i don't see the code?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

joe, 3 months ago to llm

Back in December, I paid $1,425 to replace my MacBook Pro to make my LLM research at all possible. That had an M1Pro CPU and 32GB of RAM, which (as I said previously) is kind of a bare minimum spec to run a useful local AI. I quickly wished I had enough RAM to run a 70B model, but you can’t upgrade Apple products after the fact and a 70B model needs 64GB of RAM. That led me to start looking for a second-hand Linux desktop that can handle a 70B model.

I ended up finding a 4yr-old HP Z4 G4 workstation with a Xeon® W-2125 Processor,128 GB of DDR4 2666 MHz RAM, a 512GB SAMSUNG nVme SSD, and a NVIDIA Quadro P4000 GPU with 8GB of GDDR5 GPU Memory. I bought it before Ollama released their Windows preview, so I planned to throw the latest Ubuntu LTS on it.

Going into this experiment, I was expecting that Ollama would thrash the GPU and the RAM but would use the CPU sparingly. I was not correct.

This is what the activity monitor looked like when I asked various models to tell me about themselves:

Mixtral

An ubuntu activity monitor while running mixtral

Llama2:70b

An ubuntu activity monitor while running Llama2:70b

Llama2:7b

An ubuntu activity monitor while running llama2:7b

Codellama

An ubuntu activity monitor while running codellama

The Xeon W-2125 has 8 threads and 4 cores, so I think that CPU1-CPU8 are threads. My theory going into this was that the models would go into memory and then the GPU would do all of the processing. The CPU would only be needed to serve the results back to the user. It looks like the full load is going to the CPU. For a moment, I thought that the 8 GB of video RAM was the limitation. That is why I tried running a 7b model for one of the tests. I am still not convinced that Ollama is even trying to use the GPU.

A screenshot of the "additional drivers" screen in ubuntu

I am using a proprietary Nvidia driver for the GPU but maybe I’m missing something?

I was recently playing around with Stability AI’s Stability Cascade. I might need to run those tests on this machine to see what the result is. It may be an Ollama-specific issue.

Have any questions, comments, or concerns? Please feel free to drop a comment, below. As a blanket warning, all of these posts are personal opinions and do not reflect the views or ethics of my employer. All of this research is being done off-hours and on my own dime.

https://jws.news/2024/hp-z4-g4-workstation/

#Llama2 #LLM #Mac #Ollama #Ubuntu

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ all

ironicbadger, 3 months ago to selfhosted

Did someone say self-hosted LLMs? #selfhosted #ollama

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

Prozak, 3 months ago

@ironicbadger that’s the gateway to buying a Macstudio Ultra with 128gb of ram because memory is king 🤭

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ironicbadger, 3 months ago

@Prozak or an Epyc Rome based system with 512gb ram like what I did last week and is currently inbound from China? :) 🇨🇳

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago to random

A long long time ago, @arfy made a lua script for Dolphin screen-readers that allowed you to type in plus or minus number of days and get the date. I just asked Dolphin Mixtral to do the same as an apple script using #Ollama running locally and it actually did it. It runs and works just as I wanted. Madness.

set numDays to text returned of (display dialog "Enter the number of days:" default answer "")
set targetDate to current date
set newDate to targetDate + numDays * days
display dialog "The future date will be: " & (newDate as string)

reply

expand (11)

collapse (11)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ datajake1999

FreakyFwoof, 4 months ago

@tristan @sclower @arfy No idea. Can't seem to find much in activity monitor. Never goes into swap though, but I have 64GB on this box.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 4 months ago

@FreakyFwoof I think I need to try this and hope it doesn't explode my poor 24GB air. What are you using to run these models? Is there some tool that gives us a worthwhile GUI or are you just doing it in the terminal? I haven't really played with any of this stuff but I feel like I need to start, and unfortunately the Mac is probably my best hope of running AI stuff.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...