joe, to ai

Back in December, I started exploring how all of this AI stuff works. Last week’s post was about the basics of how to run your AI. This week, I wanted to cover some frequently asked questions.

What is a Rule-Based Inference Engine?

A rule-based inference engine is designed to apply predefined rules to a given set of facts or inputs to derive conclusions or make decisions. It operates by using logical rules, which are typically expressed in an “if-then” format. You can think of it as basically a very complex version of the spell check in your text editor.

What is an AI Model?

AI models employ learning algorithms that draw conclusions or predictions from past data. An AI model’s data can come from various sources such as labeled data for supervised learning, unlabeled data for unsupervised learning, or data generated through interaction with an environment for reinforcement learning. The algorithm is the step-by-step procedure or set of rules that the model follows to analyze data and make predictions. Different algorithms have different strengths and weaknesses, and some are better suited for certain types of problems than others. A model has parameters that are the aspects of the model that are learned from the training data. A model’s complexity can be measured by the number of parameters contained in it but complexity can also depend on the architecture of the model (how the parameters interact with each other) and the types of parameters used.

What is an AI client?

An AI client is how the user interfaces with the rule-based inference engine. Since you can use the engine directly, the engine itself could also be the client. For the most part, you are going to want something web-based or a graphical desktop client, though. Good examples of graphical desktop clients would be MindMac or Ollamac. A good example of a web-based client would be Ollama Web UI. A good example of an application that is both a client and a rule-based inference engine is LM Studio. Most engines have APIs and language-specific libraries, so if you want to you can even write your own client.

What is the best client to use with a Rule-Based Inference Engine?

I like MindMac. I would recommend either that or Ollama Web UI. You can even host both Ollama and Ollama Web UI together using docker compose.

What is the best Rule-Based Inference Engine?

I have tried Ollama, Llama.cpp, and LM Studio. If you are using Windows, I would recommend LM Studio. If you are using Linux or a Mac, I would recommend Ollama.

How much RAM does your computer need to run a Rule-Based Inference Engine?

The RAM requirement is dependent upon what model you are using. If you browse the Ollama library, Hugging Face, or LM Studio‘s listing of models, most listings will list a RAM requirement (example) based on the number of parameters in the model. Most 7b models can run on a minimum of 8GB of RAM while most 70b models will require 64GB of RAM. My Macbook Pro has 32GB of unified memory and struggles to run Wizard-Vicuna-Uncensored 30b. My new AI lab currently has 128GB of DDR4 RAM and I hope that it can run 70b models reliably.

Does your computer need a dedicated GPU to run a Rule-Based Inference Engine?

No, you don’t. You can use just the CPU but if you have an Nvidia GPU, it helps a lot.

I use Digital Ocean or Linode for hosting my website. Can I host my AI there, also?

Yeah, you can. The RAM requirement would make it a bit expensive, though. A virtual machine with 8GB of RAM is almost $50/mo.

Why wouldn’t you use ChatGPT, Copilot, or Bard?

When you use any of them, your interactions are used to reinforce the training of the model. That is an issue for more than the most basic prompts. In addition to that, they cost up to $30/month/user.

Why should you use an open-source LLM?

What opinion does your employer have of this research project?

You would need to direct that question to them. All of these posts should be considered personal opinions and do not reflect the views or ethics of my employer. All of this research is being done off-hours and on my own dime.

Why are you interested in this technology?

It is a new technology that I didn’t consider wasteful bullshit in the first hour of researching it.

Are you afraid that AI will take your job?

No.

What about image generation?

I used (and liked) Noiselith until it shut down. DiffusionBee works but I think that Diffusers might be the better solution. Diffusers lets you use multiple models and it is easier to use than Stable Diffusion Web UI.

You advocate for not using ChatGPT. Do you use it?

I do. ChatGPT 4 is a 1.74t model. It can do cool things. I have an API key and I use it via MindMac. Using it that way means that I pay based on how much I use it instead of using it via a Pro account, though.

Are you going to only write about AI on here, now?

Nope. I still have other interests. Expect more Vue.js posts and likely something to do with Unity or Unreal at some point.

Is this going to be the last AI FAQ post?

Nope. I still haven’t covered training or fine-tuning.

https://jws.news/2024/ai-frequently-asked-questions/

#AI #ChatGPT #Docker #LLM #LMStudio #MindMac #Ollama #Ollamac

ramikrispin, to python
@ramikrispin@mstdn.social avatar

(1/3) Last Friday, I was planning to watch Masters of the Air ✈️, but my ADHD had different plans 🙃, and I ended up running a short POC and creating a tutorial for getting started with Ollama Python 🚀. The settings are available for both Docker 🐳 and locally.

TLDR: It is straightforward to run LLM models locally with the Ollama Python library. Models with up to ~7B parameters run smoothly with low compute resources.

drgroftehauge,
@drgroftehauge@sigmoid.social avatar

@ramikrispin just do volume mounts and mount wherever ollama caches models
--volume path/ollama/cache:path/ollama/cache

ramikrispin,
@ramikrispin@mstdn.social avatar

@drgroftehauge I tried mounting the local folder but it deleted the local models once stopping running the container. I will try caching; thx!

changelog, to opensource
@changelog@changelog.social avatar

🗞 New episode of Changelog News!

💰 Rune’s $100k for indie game devs
🤲 The Zed editor is now open source
🦙 Ollama’s new JS & Python libs
🤝 @tekknolagi's Scrapscript story
🗒️ Pooya Parsa's notes from a tired maintainer
🎙 hosted by @jerod

🎧 https://changelog.com/news/79

joe, (edited ) to ai

Around a year ago, I started hearing more and more about OpenAI‘s ChatGPT. I didn’t pay much attention to it until this past summer when I watched the intern use it where I would normally use Stack Overflow. After that, I started poking at it and created things like the Milwaukee Weather Limerick and a bot that translates my Mastodon toots to Klingon. Those are cool tricks but eventually, I realized that you could ask it for detailed datasets like “the details of every state park“, “a list of three-ingredient cocktails“, or “a CSV of counties in Wisconsin.” People are excited about getting it to write code for you or to do a realistic rendering of a bear riding a unicycle through the snow but I think that is just the tip of the iceberg in a world where it can do research for you.

The biggest limitation of something like ChatGPT, Copilot, or Bard is that your data leaves your control when you use the AI. I believe that the future of AI is AI that remains in your control. The only issue with running your own, local AI is that a large learning model (LLM) needs a lot of resources to run. You can’t do it on your old laptop. It can be done, though. Last month, I bought a new Macbook Pro with an M1Pro CPU and 32GB of unified RAM to test this stuff out.

If you are in a similar situation, Mozilla’s Llamafile project is a good first step. A llamafile can run on multiple CPU microarchitectures. It uses Cosmopolitan Libc to provide a single 4GB executable that can run on macOS, Windows, Linux, FreeBSD, OpenBSD, and NetBSD. It contains a web client, the model file, and the rule-based inference engine. You can just download the binary, execute it, and interact with it through your web browser. This has very limited utility, though.

So, how do you get from a proof of concept to something closer to ChatGPT or Bard? You are going to need a model, a rule-based inference engine or reasoning engine, and a client.

The Rule-Based Inference Engine

A rule-based inference engine is a piece of software that derives answers or conclusions based on a set of predefined rules and facts. You load models into it and it handles the interface between the model and the client. The two major players in the space are Llama.cpp and Ollama. Getting Ollama is as easy as downloading the software and running ollama run [model] from the terminal.

Screenshot of Ollama running in the terminal on MacOS

In the case of Ollama, you can even access it via an API to the inference engine.

A screenshot of Postman interacting with Ollama via a local JSON API

You will notice that the result isn’t easy to parse. Last week, Ollama announced Python and JavaScript libraries to make it much easier.

The Models

A model consists of numerous parameters that adjust during the learning process to improve its predictions. They employ learning algorithms that draw conclusions or predictions from past data. I’m going to be honest with you. This is the bit that I understand the least. The key attributes to be aware of with models are what it is trained on, how many parameters big the model is, and the model’s benchmark numbers.

If you browse Hugging Face or the Ollama model library, you will see that there are plenty of 7b, 13b, and 70b models. That number tells you how many parameters are in the model. Generally, a 70b model is going to be more competent than a 7b model. A 7b model has 7 billion parameters whereas a 70b model has 70 billion parameters. To give you a point of comparison, ChatGPT 4 reportedly has 1.76 trillion parameters.

The number of parameters isn’t the end-all-be-all, though. There are leaderboards and benchmarks (like HellaSwag, ARC, and TruthFulQA) for determining comparative model quality.

If you are running Ollama, downloading and running a new model is as easy as browsing the model library, finding the right one for your purposes, and running ollama run [model] from the terminal. You can manage the installed models from the Ollama Web UI also, though.

A screenshot from the Ollama Web UI, showing how to manage models

The Client

The client is what the user of the AI uses to interact with the rule-based inference engine. If you are using Ollama, the Ollama Web UI is a great option. It gives you a web interface that acts and behaves a lot like the ChatGPT web interface. There are also desktop clients like Ollamac and MacGPT but my favorite so far is MindMac. It not only gives you a nice way to switch from model to model but it also gives you the ability to switch between providers (Ollama, OpenAI, Azure, etc).

A screenshot of the MindMac settings panel, showing how to add new accounts

The big questions

I have a few big questions, right now. How well does Ollama scale from 1 user to 100 users? How do you finetune a model? How do you secure Ollama? Most interesting to me, how do you implement something like Stable Diffusion XL with this stack? I ordered a second-hand Xeon workstation off of eBay to try to answer some of these questions. In the workplace setting, I’m also curious what safeguards are needed to insulate the company from liability. These are all things that need addressing over time.

I created a new LLM / ML category here and I suspect that this won’t be my last post on the topic. As a blanket warning, all of these posts are personal opinions and do not reflect the views or ethics of my employer. All of this research is being done off-hours and on my own dime.

Have a question or comment? Please drop a comment, below.

https://jws.news/2024/ai-basics/

ramikrispin, to llm
@ramikrispin@mstdn.social avatar

After a long night, a short tutorial for getting started with the Ollama Python version is now available here:

https://github.com/RamiKrispin/ollama-poc

#llm #ollama #llama #mistral #DataScience #python #docker

Lunatech,

@ramikrispin What if you are using a Mac and you don't want to mess with Docker? Can you just ignore the "Running Ollama with Docker" part?

ramikrispin,
@ramikrispin@mstdn.social avatar

@Lunatech yes, you can run it locally with Python venv:

https://github.com/RamiKrispin/ollama-poc#running-locally

ramikrispin, to llm
@ramikrispin@mstdn.social avatar

Running Mistral LLM locally with Ollama's 🦙 new Python 🐍 library inside a dockerized 🐳 environment with the allocation of 4 CPUs and 8 GB RAM. It took 19 sec to get a response 🚀. The last time I tried to run LLM locally, it took 10 minutes to get a response 🤯

#llm #mistral #python #ollama #MachineLearning #nlp

ramikrispin,
@ramikrispin@mstdn.social avatar

Code and docker environment are available here:

https://github.com/RamiKrispin/ollama-poc/blob/main/ollama-poc.ipynb

joe, to ai

Anyone out there dabbling with on-prem AI? All of the numbers that I’m seeing for RAM requirements on 7b, 13b, 70b models seem to be correct for a 1-user scenario but I’m curious what folks are seeing for 2, 10, or 50 users.

#Ollama #llama #AI #LLM

joe, to random

I really need to work on diagramming out how to deploy at scale, this weekend.

boilingsteam, to linux
@boilingsteam@mastodon.cloud avatar
bkoehn, to random
@bkoehn@hachyderm.io avatar

It’s kind of amazing what you can build with #n8n and #ollama. In a few clicks you can make the mother of all email classifiers.

Mrw, to homelab
@Mrw@hachyderm.io avatar

#HomeLab Saturday.
The goal today was to get #ollama deployed on the cluster. That’s a fun way to run your own models on whatever accelerators you have handy. It’ll run on your CPU, sure, but man is it slow.

Nvidia now ships a GPU operator, which handles annotating nodes and managing the resource type. “All you need to do” — the most dangerous phrase in computers — is smuggle the GPUs through whatever virtualization you’re doing, and expose them to containerd properly.

But I got there! Yay.

greg, to homeassistant

W̶a̶k̶e̶ ̶o̶n̶ ̶L̶A̶N̶ Wake-on-Zigbee

Maintaining Wake-on-LAN on a dual-boot Windows 10 / Ubuntu 22.04LTS system is a hassle. So I went with a simple Fingerbot solution. Now I have Wake-on-Zigbee!

By default the system boots into Ubuntu which hosts an Ollama server and does some video compression jobs (I wanted to be able to start those remotely). I only use Windows for VR gaming when I'm physically in the room and therefore can select the correct partition at boot.

Using Zigbee Fingerbot to turn on PC

stooovie,
@stooovie@mas.to avatar

@greg Switchbot is just so hilarious 😂👌

boilingsteam, to foss
@boilingsteam@mastodon.cloud avatar
  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • InstantRegret
  • mdbf
  • osvaldo12
  • magazineikmin
  • GTA5RPClips
  • rosin
  • thenastyranch
  • Youngstown
  • cubers
  • slotface
  • khanakhh
  • kavyap
  • DreamBathrooms
  • anitta
  • Durango
  • everett
  • ethstaker
  • cisconetworking
  • provamag3
  • Leos
  • modclub
  • ngwrru68w68
  • tacticalgear
  • tester
  • megavids
  • normalnudes
  • lostlight
  • All magazines