ironicbadger, to selfhosted
@ironicbadger@techhub.social avatar

I hooked the Enchanted LLM app up to my instance tonight. Running on the epyc box with an nvidia a4000.

I can’t notice a difference in speed between this and the real chat-gpt tbh. And I own the whole chain locally. Man this is cool!!

I even shared the ollama api endpoint with a buddy over Tailscale and now they’re whipping the llamas 🦙 api as well. Super fun times.

https://apps.apple.com/us/app/enchanted-llm/id6474268307

DavidMarzalC, to Symfony Spanish
@DavidMarzalC@mastodon.escepticos.es avatar

Encaramos el final de marzo con otro nuevo episodio de "Accesibilidad con Tecnologías libres", para los amigues.

En https://accesibilidadtl.gitlab.io/05 tenéis las notas del programa con los temas tratados y los enlaces.

Si seguís a este alias @6706483 , tendréis automáticamente las publicaciones del podcast dentro del Fediverso

Y por si aún no lo tenéis el feed RSS para vuestras aplicaciones es:
https://accesibilidadtl.gitlab.io/feed

En este episodio 05 participan entre otras personas:

Esperamos que os resulte interesante.

kohelet, to llm
@kohelet@mstdn.social avatar

So,
I downloaded Ollama,
installed a local LLM,
installed the Continue VSCode extension, configured it with my local LLM.

Now I just need something to do with it!
Like, any project at all.
huh.

#ollama #llm #continue

5am, to LLMs
@5am@fosstodon.org avatar

I've been playing around with locally hosted using the tool. I've mostly been using models like mistral and dolphin-coder for assistance with textual ideas and issues. More recently I've been using the llava visual model via some simple , looping through images and creating description files. I can then grep those files for key words and note the associated filenames. Powerful stuff!

orhun, to rust
@orhun@fosstodon.org avatar

Here is how you can use ChatGPT in your terminal - with an interface! 🔥

🦾 tenere: TUI for LLMs written in Rust.

🚀 Supports ChatGPT, llama.cpp & ollama

🦀 Built with @ratatui_rs

⭐ GitHub: https://github.com/pythops/tenere

video/mp4

ironicbadger, to NixOS
@ironicbadger@techhub.social avatar

A fully self-hosted, totally offline, and local AI using and open-webui as a front-end for

my_actual_brain, to llm
@my_actual_brain@fosstodon.org avatar

I’m surprised at the performance an #llm can run at on my 8700-cpu.

It’s a bit slow and I don’t think it’s worth getting a gpu just to make it run at a faster speed, but maybe in time I’ll reconsider it.

If I were going to get a #gpu, what would you recommend?

#gpt #ollama #llama2

gael, (edited ) to llm French
@gael@mastodon.social avatar

Imagine a running locally on a smartphone...

It could be used to fuel an offline assistant that would be able to easily add an appointment to your calendar, open an app, etc. without issues.

This has come to reality with this proof of concept using the Phi-2 2.8B transformer model running on /e/OS.

It is slow, so not very usable until we have dedicated chips on SoCs, but works (and !)

🙏 Stypox

on 👇 👇 👇

owzim,
@owzim@mastodon.social avatar

@gael I'd say before that happens, running an LLM on your local network is your best bet. Projects like https://ollama.com/ make that incredibly easy.

dliden, to emacs
@dliden@emacs.ch avatar

Has anyone here worked much with generators in #emacs ?

I am looking for a good solution for streaming outputs in my ollama-elisp-sdk project. I think there's a good angle using generators to make a workflow fairly similar to e.g. the OpenAI API. Not sure yet though.

#elisp #ollama #ai #emacs

joe, to random

This past month, I was talking about how I spent $528 to buy a machine with enough guts to run more demanding AI models in Ollama. That is good and all but if you are not on that machine (or at least on the same network), it has limited utility. So, how do you use it if you are at a library or a friend’s house? I just discovered Tailscale. You install the Tailscale app on the server and all of your client devices and it creates an encrypted VPN connection between them. Each device on your “tailnet” has 4 addresses you can use to reference it:

  • Machine name: my-machine
  • FQDN: my-machine.tailnet.ts.net
  • IPv4: 100.X.Y.Z
  • IPv6: fd7a:115c:a1e0::53

If you remember Hamachi from back in the day, it is kind of the spiritual successor to that.

https://i0.wp.com/jws.news/wp-content/uploads/2024/03/Screenshot-2024-03-04-at-2.37.06%E2%80%AFPM.png?resize=1024%2C592&ssl=1

There is no need to poke holes in your firewall or expose your Ollama install to the public internet. There is even a client for iOS, so you can run it on your iPad. I am looking forward to playing around with it some more.

https://jws.news/2024/joe-discovered-tailscale/

sesivany, (edited ) to ai
@sesivany@floss.social avatar

#Ollama is the easiest way to run local #AI I've tried so far. In 5 minutes you can have a chatbot running on a local model. Dozens of models and UIs to choose from.
Just the speed is not great, but what can I expect on an Intel-only laptop.

bmp, to random
@bmp@mastodon.sdf.org avatar

Completely forgot I had made this #fountainpen database a while ago when I was bored: https://codeberg.org/bmp/flock, it is written in Go, and was generated with #ollama if I remember correctly. Maybe I'll pick it up again, given that newer models seem to be better.

davidbisset, to apple
@davidbisset@phpc.social avatar

NotesOllama uses #Ollama to talk to local LLMs in #Apple Notes on #macos Inspired by Obsidian Ollama. #AI

https://smallest.app/notesollama/

Not sure if i trust this if i don't see the code?

joe, to llm

Back in December, I paid $1,425 to replace my MacBook Pro to make my LLM research at all possible. That had an M1Pro CPU and 32GB of RAM, which (as I said previously) is kind of a bare minimum spec to run a useful local AI. I quickly wished I had enough RAM to run a 70B model, but you can’t upgrade Apple products after the fact and a 70B model needs 64GB of RAM. That led me to start looking for a second-hand Linux desktop that can handle a 70B model.

I ended up finding a 4yr-old HP Z4 G4 workstation with a Xeon® W-2125 Processor,128 GB of DDR4 2666 MHz RAM, a 512GB SAMSUNG nVme SSD, and a NVIDIA Quadro P4000 GPU with 8GB of GDDR5 GPU Memory. I bought it before Ollama released their Windows preview, so I planned to throw the latest Ubuntu LTS on it.

Going into this experiment, I was expecting that Ollama would thrash the GPU and the RAM but would use the CPU sparingly. I was not correct.

This is what the activity monitor looked like when I asked various models to tell me about themselves:

Mixtral

An ubuntu activity monitor while running mixtral

Llama2:70b

An ubuntu activity monitor while running Llama2:70b

Llama2:7b

An ubuntu activity monitor while running llama2:7b

Codellama

An ubuntu activity monitor while running codellama

The Xeon W-2125 has 8 threads and 4 cores, so I think that CPU1-CPU8 are threads. My theory going into this was that the models would go into memory and then the GPU would do all of the processing. The CPU would only be needed to serve the results back to the user. It looks like the full load is going to the CPU. For a moment, I thought that the 8 GB of video RAM was the limitation. That is why I tried running a 7b model for one of the tests. I am still not convinced that Ollama is even trying to use the GPU.

A screenshot of the "additional drivers" screen in ubuntu

I am using a proprietary Nvidia driver for the GPU but maybe I’m missing something?

I was recently playing around with Stability AI’s Stability Cascade. I might need to run those tests on this machine to see what the result is. It may be an Ollama-specific issue.

Have any questions, comments, or concerns? Please feel free to drop a comment, below. As a blanket warning, all of these posts are personal opinions and do not reflect the views or ethics of my employer. All of this research is being done off-hours and on my own dime.

https://jws.news/2024/hp-z4-g4-workstation/

ironicbadger, to selfhosted
@ironicbadger@techhub.social avatar

Did someone say self-hosted LLMs?

joe, to ubuntu
@joe@toot.works avatar

I am unsure how to fix "Unable to change power state from D3hot to DO, device inaccessible". I would have expected that installing desktop would have gotten easier over the years.

joe,
@joe@toot.works avatar

Well, f***. I thought that running a 70B AI model on a machine with 128 gigabytes of RAM would tax the RAM, not the CPU. Apparently that Xeon processor is the bottleneck.

I should check to make sure that the GPU is in use.

The resource monitor in Ubuntu 22 immediately after running code llama 70B through Ollama

FreakyFwoof, to random

A long long time ago, @arfy made a lua script for Dolphin screen-readers that allowed you to type in plus or minus number of days and get the date. I just asked Dolphin Mixtral to do the same as an apple script using #Ollama running locally and it actually did it. It runs and works just as I wanted. Madness.

set numDays to text returned of (display dialog "Enter the number of days:" default answer "")
set targetDate to current date
set newDate to targetDate + numDays * days
display dialog "The future date will be: " & (newDate as string)

joe, to ai

Back in December, I started exploring how all of this AI stuff works. Last week’s post was about the basics of how to run your AI. This week, I wanted to cover some frequently asked questions.

What is a Rule-Based Inference Engine?

A rule-based inference engine is designed to apply predefined rules to a given set of facts or inputs to derive conclusions or make decisions. It operates by using logical rules, which are typically expressed in an “if-then” format. You can think of it as basically a very complex version of the spell check in your text editor.

What is an AI Model?

AI models employ learning algorithms that draw conclusions or predictions from past data. An AI model’s data can come from various sources such as labeled data for supervised learning, unlabeled data for unsupervised learning, or data generated through interaction with an environment for reinforcement learning. The algorithm is the step-by-step procedure or set of rules that the model follows to analyze data and make predictions. Different algorithms have different strengths and weaknesses, and some are better suited for certain types of problems than others. A model has parameters that are the aspects of the model that are learned from the training data. A model’s complexity can be measured by the number of parameters contained in it but complexity can also depend on the architecture of the model (how the parameters interact with each other) and the types of parameters used.

What is an AI client?

An AI client is how the user interfaces with the rule-based inference engine. Since you can use the engine directly, the engine itself could also be the client. For the most part, you are going to want something web-based or a graphical desktop client, though. Good examples of graphical desktop clients would be MindMac or Ollamac. A good example of a web-based client would be Ollama Web UI. A good example of an application that is both a client and a rule-based inference engine is LM Studio. Most engines have APIs and language-specific libraries, so if you want to you can even write your own client.

What is the best client to use with a Rule-Based Inference Engine?

I like MindMac. I would recommend either that or Ollama Web UI. You can even host both Ollama and Ollama Web UI together using docker compose.

What is the best Rule-Based Inference Engine?

I have tried Ollama, Llama.cpp, and LM Studio. If you are using Windows, I would recommend LM Studio. If you are using Linux or a Mac, I would recommend Ollama.

How much RAM does your computer need to run a Rule-Based Inference Engine?

The RAM requirement is dependent upon what model you are using. If you browse the Ollama library, Hugging Face, or LM Studio‘s listing of models, most listings will list a RAM requirement (example) based on the number of parameters in the model. Most 7b models can run on a minimum of 8GB of RAM while most 70b models will require 64GB of RAM. My Macbook Pro has 32GB of unified memory and struggles to run Wizard-Vicuna-Uncensored 30b. My new AI lab currently has 128GB of DDR4 RAM and I hope that it can run 70b models reliably.

Does your computer need a dedicated GPU to run a Rule-Based Inference Engine?

No, you don’t. You can use just the CPU but if you have an Nvidia GPU, it helps a lot.

I use Digital Ocean or Linode for hosting my website. Can I host my AI there, also?

Yeah, you can. The RAM requirement would make it a bit expensive, though. A virtual machine with 8GB of RAM is almost $50/mo.

Why wouldn’t you use ChatGPT, Copilot, or Bard?

When you use any of them, your interactions are used to reinforce the training of the model. That is an issue for more than the most basic prompts. In addition to that, they cost up to $30/month/user.

Why should you use an open-source LLM?

What opinion does your employer have of this research project?

You would need to direct that question to them. All of these posts should be considered personal opinions and do not reflect the views or ethics of my employer. All of this research is being done off-hours and on my own dime.

Why are you interested in this technology?

It is a new technology that I didn’t consider wasteful bullshit in the first hour of researching it.

Are you afraid that AI will take your job?

No.

What about image generation?

I used (and liked) Noiselith until it shut down. DiffusionBee works but I think that Diffusers might be the better solution. Diffusers lets you use multiple models and it is easier to use than Stable Diffusion Web UI.

You advocate for not using ChatGPT. Do you use it?

I do. ChatGPT 4 is a 1.74t model. It can do cool things. I have an API key and I use it via MindMac. Using it that way means that I pay based on how much I use it instead of using it via a Pro account, though.

Are you going to only write about AI on here, now?

Nope. I still have other interests. Expect more Vue.js posts and likely something to do with Unity or Unreal at some point.

Is this going to be the last AI FAQ post?

Nope. I still haven’t covered training or fine-tuning.

https://jws.news/2024/ai-frequently-asked-questions/

ramikrispin, to python
@ramikrispin@mstdn.social avatar

(1/3) Last Friday, I was planning to watch Masters of the Air ✈️, but my ADHD had different plans 🙃, and I ended up running a short POC and creating a tutorial for getting started with Ollama Python 🚀. The settings are available for both Docker 🐳 and locally.

TLDR: It is straightforward to run LLM models locally with the Ollama Python library. Models with up to ~7B parameters run smoothly with low compute resources.

ramikrispin,
@ramikrispin@mstdn.social avatar

(2/3) The tutorial focuses on the following topics:
✅ Setting up Ollama server 🦙
✅ Setting up Python environment 🐍
✅ Pulling and running LLM (examples of Mistral, Llama2, and Vicuna)

Tutorial 📖: https://github.com/RamiKrispin/ollama-poc

#python #ollama #llama #mistral #llm #DataScience #ai

ramikrispin,
@ramikrispin@mstdn.social avatar

(3/3) The tutorial will get you to run Ollama inside a dockerized container. Yet, there are some missing pieces, such as mounting LLM models from the local environment to avoid downloading the models during the build time. I plan to explore this topic sometime in the coming weeks.

#python #ollama #llama #mistral #llm #DataScience #ai

changelog, to opensource
@changelog@changelog.social avatar

🗞 New episode of Changelog News!

💰 Rune’s $100k for indie game devs
🤲 The Zed editor is now open source
🦙 Ollama’s new JS & Python libs
🤝 @tekknolagi's Scrapscript story
🗒️ Pooya Parsa's notes from a tired maintainer
🎙 hosted by @jerod

🎧 https://changelog.com/news/79

#opensource #zed #ollama #scrapscript #javascript #python #news #podcast

joe, to random
@joe@toot.works avatar

I just published "Learning AI: The Basics" over on @joe. https://jws.news/2024/ai-basics/

joe,
@joe@toot.works avatar

This morning, I posted a "The Basics" blog post about what I learned running #Ollama on the new (second-hand) Macbook Pro that I bought last month. I recently bought a new (second-hand) "God-mode" workstation to take this stuff a little further.

I still want to explore:

joe, (edited ) to ai

Around a year ago, I started hearing more and more about OpenAI‘s ChatGPT. I didn’t pay much attention to it until this past summer when I watched the intern use it where I would normally use Stack Overflow. After that, I started poking at it and created things like the Milwaukee Weather Limerick and a bot that translates my Mastodon toots to Klingon. Those are cool tricks but eventually, I realized that you could ask it for detailed datasets like “the details of every state park“, “a list of three-ingredient cocktails“, or “a CSV of counties in Wisconsin.” People are excited about getting it to write code for you or to do a realistic rendering of a bear riding a unicycle through the snow but I think that is just the tip of the iceberg in a world where it can do research for you.

The biggest limitation of something like ChatGPT, Copilot, or Bard is that your data leaves your control when you use the AI. I believe that the future of AI is AI that remains in your control. The only issue with running your own, local AI is that a large learning model (LLM) needs a lot of resources to run. You can’t do it on your old laptop. It can be done, though. Last month, I bought a new Macbook Pro with an M1Pro CPU and 32GB of unified RAM to test this stuff out.

If you are in a similar situation, Mozilla’s Llamafile project is a good first step. A llamafile can run on multiple CPU microarchitectures. It uses Cosmopolitan Libc to provide a single 4GB executable that can run on macOS, Windows, Linux, FreeBSD, OpenBSD, and NetBSD. It contains a web client, the model file, and the rule-based inference engine. You can just download the binary, execute it, and interact with it through your web browser. This has very limited utility, though.

So, how do you get from a proof of concept to something closer to ChatGPT or Bard? You are going to need a model, a rule-based inference engine or reasoning engine, and a client.

The Rule-Based Inference Engine

A rule-based inference engine is a piece of software that derives answers or conclusions based on a set of predefined rules and facts. You load models into it and it handles the interface between the model and the client. The two major players in the space are Llama.cpp and Ollama. Getting Ollama is as easy as downloading the software and running ollama run [model] from the terminal.

Screenshot of Ollama running in the terminal on MacOS

In the case of Ollama, you can even access it via an API to the inference engine.

A screenshot of Postman interacting with Ollama via a local JSON API

You will notice that the result isn’t easy to parse. Last week, Ollama announced Python and JavaScript libraries to make it much easier.

The Models

A model consists of numerous parameters that adjust during the learning process to improve its predictions. They employ learning algorithms that draw conclusions or predictions from past data. I’m going to be honest with you. This is the bit that I understand the least. The key attributes to be aware of with models are what it is trained on, how many parameters big the model is, and the model’s benchmark numbers.

If you browse Hugging Face or the Ollama model library, you will see that there are plenty of 7b, 13b, and 70b models. That number tells you how many parameters are in the model. Generally, a 70b model is going to be more competent than a 7b model. A 7b model has 7 billion parameters whereas a 70b model has 70 billion parameters. To give you a point of comparison, ChatGPT 4 reportedly has 1.76 trillion parameters.

The number of parameters isn’t the end-all-be-all, though. There are leaderboards and benchmarks (like HellaSwag, ARC, and TruthFulQA) for determining comparative model quality.

If you are running Ollama, downloading and running a new model is as easy as browsing the model library, finding the right one for your purposes, and running ollama run [model] from the terminal. You can manage the installed models from the Ollama Web UI also, though.

A screenshot from the Ollama Web UI, showing how to manage models

The Client

The client is what the user of the AI uses to interact with the rule-based inference engine. If you are using Ollama, the Ollama Web UI is a great option. It gives you a web interface that acts and behaves a lot like the ChatGPT web interface. There are also desktop clients like Ollamac and MacGPT but my favorite so far is MindMac. It not only gives you a nice way to switch from model to model but it also gives you the ability to switch between providers (Ollama, OpenAI, Azure, etc).

A screenshot of the MindMac settings panel, showing how to add new accounts

The big questions

I have a few big questions, right now. How well does Ollama scale from 1 user to 100 users? How do you finetune a model? How do you secure Ollama? Most interesting to me, how do you implement something like Stable Diffusion XL with this stack? I ordered a second-hand Xeon workstation off of eBay to try to answer some of these questions. In the workplace setting, I’m also curious what safeguards are needed to insulate the company from liability. These are all things that need addressing over time.

I created a new LLM / ML category here and I suspect that this won’t be my last post on the topic. As a blanket warning, all of these posts are personal opinions and do not reflect the views or ethics of my employer. All of this research is being done off-hours and on my own dime.

Have a question or comment? Please drop a comment, below.

https://jws.news/2024/ai-basics/

#AI #Llama2 #LLM #Ollama

ramikrispin, to llm
@ramikrispin@mstdn.social avatar

After a long night, a short tutorial for getting started with the Ollama Python version is now available here:

https://github.com/RamiKrispin/ollama-poc

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • kavyap
  • DreamBathrooms
  • thenastyranch
  • magazineikmin
  • tacticalgear
  • cubers
  • Youngstown
  • mdbf
  • slotface
  • rosin
  • osvaldo12
  • ngwrru68w68
  • GTA5RPClips
  • provamag3
  • InstantRegret
  • everett
  • Durango
  • cisconetworking
  • khanakhh
  • ethstaker
  • tester
  • anitta
  • Leos
  • normalnudes
  • modclub
  • megavids
  • lostlight
  • All magazines