joe, to ai

So far this week, we have looked at how to use Ollama from the CLI, how to use Ollama from the web service, and how to use Ollama from a phone or iPad. Today we are going to be using the Ollama JavaScript Library to write an application.

Install the Ollama Library

The first step is to run npm i ollama from the terminal.

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-21-at-8.30.04%E2%80%AFAM.png?resize=1024%2C728&ssl=1

That installs Ollama as a dependency in package.json.

Basic CLI example

At this point, we can start writing code. When we used the web service earlier this week, we used the generate endpoint and provided model, prompt, and stream as parameters. We set the stream parameter to false so that it would return a single response object instead of a stream of objects. When using the javascript library, the stream parameter isn’t necessary because it returns a single response object by default. We still provide it with a model and a prompt, though.

If you run it from the terminal, the response will look familiar.

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-21-at-9.19.38%E2%80%AFAM.png?resize=1024%2C728&ssl=1

Basic Web Application Example

The output is very similar to the node-fetch example from earlier this week. Last week, when we looked at how to dockerize a node app, we output an array as an unordered list. Let’s see if we can replicate that result using the output from Ollama.

If you npm install express to install express, you can host a simple HTTP page at port 8080 and with the magic of JSON.parse() and a for loop, you can build your unordered list.

So, what does the output look like?

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-21-at-8.03.57%E2%80%AFPM.png?resize=1024%2C796&ssl=1

Every time you load the page, it makes a server-side API call to Ollama, gets a list of large cities in Wisconsin, and displays them on the website. The list is never the same (because of hallucinations) but that is another issue.

Have any questions, comments, etc? Please feel free to drop a comment, below.

https://jws.news/2024/how-to-write-a-javascript-app-that-uses-ollama/

joe, to ai

Earlier this year, I started looking at how to run a fully on-prem AI. In February, I bought a machine to run the inference engine on and set up Tailscale (which works similarly to Hamachi) to connect to it remotely. If you want to use it remotely, there are a lot of options for native clients.

MacOS

My favorite client for MacOS is MindMac. You can buy it for under $30, it works with multiple models, servers, and server types, and it is easy to use.

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-20-at-2.34.12%E2%80%AFPM.png?resize=1024%2C690&ssl=1

If you want to look further into it, you can check it out at mindmac.app.

Android

My favorite client for Android is Amallo. It is $23 and like MindMac, it works with multiple models, servers, and server types. My only complaint would be that uploading a base64-encoded image to the model doesn’t seem to work well.

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot_20240420-143906.png?resize=461%2C1024&ssl=1

If you want to look further into it, you can check it out at doppeltilde.com.

ipadOS

There is a version of Amallo for iPadOS but I have been liking Enchanted LLM more. If you like it, there is a version for macOS as well. It has the added benefit of being free.

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/IMG_0088.jpg?resize=672%2C1024&ssl=1

If you want to look further into it, you can check it out at the project’s GitHub page.

Have any questions, comments, etc? Please feel free to drop a comment, below.

https://jws.news/2024/how-i-use-ai/

#AI #Amallo #Enchanted #LLM #MindMac #Ollama

joe, (edited ) to programming

Yesterday, we played with Llama 3 using the Ollama CLI client (or REPL). Today I figured that we would play with it using the Ollama API. The Ollama API is documented on their Github repo. Ollama has a client that runs when you run ollama run llama3 and a service that can be accessed from something like MindMac, Amallo, or Enchanted. The service is what starts when you run ollama serve.

In our first Llama 3 post, we asked the model for “a comma-delimited list of cities in Wisconsin with a population over 100,000 people”. Using Postman and the completion API endpoint, you can ask the same thing.

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-20-at-1.30.48%E2%80%AFPM.png?resize=1024%2C811&ssl=1

You will notice the stream parameter is set to false in the body. If the value is false, the response will be returned as a single response object, rather than a stream of objects. If you are using the API with a web application, you will want to ask the model for the answer as JSON and you will probably want to provide an example of how you want the answer formatted.

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-20-at-1.45.15%E2%80%AFPM.png?resize=1024%2C811&ssl=1

You can use Node and Node-fetch to do the same thing.

If you run it from the terminal, it will look like this:

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-20-at-2.01.19%E2%80%AFPM.png?resize=1024%2C932&ssl=1

Have any questions, comments, etc? Please feel free to drop a comment, below.

https://jws.news/2024/lets-play-more-with-llama-3/

#AI #Amallo #Enchanted #llama3 #LLM #MindMac #NodeJs #Ollama #Postman

joe, to ai

Last week, Meta announced Llama 3. Thanks to Ollama, you can run it pretty easily. There are 8b and 70b variants available. There are also pre-trained or instruction-tuned variants available. I am not seeing it on the Hugging Face Leader Board yet but the bit that I have played around with it has been promising.

Here are two basic test questions:

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-20-at-12.15.45%E2%80%AFPM.png?resize=989%2C1024&ssl=1

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-20-at-12.27.47%E2%80%AFPM.png?resize=989%2C1024&ssl=1

Have any questions, comments, etc? Please feel free to drop a comment, below.

https://jws.news/2024/lets-play-with-llama-3/

ramikrispin, to llm
@ramikrispin@mstdn.social avatar

Llama 3 is already available on Ollama 🚀👇🏼

https://ollama.com/library/llama3

ramikrispin, to datascience
@ramikrispin@mstdn.social avatar

New release to Ollama 🎉

A major release to Ollama - version 0.1.32 is out. The new version includes:
✅ Improvement of the GPU utilization and memory management to increase performance and reduce error rate
✅ Increase performance on Mac by scheduling large models between GPU and CPU
✅ Introduce native AI support in Supabase edge functions

More details on the release notes 👇🏼
https://github.com/ollama/ollama/releases

Image credit: release notes

#DataScience #MachineLearning #llm #ollama #llama #python

ainmosni, to programming
@ainmosni@berlin.social avatar

So, my #Copilot trial just expired, and while it did cut down on some typing, it also made me feel like the quality of my code was lower, and of course it felt dirty to use it considering that it's a license whitewashing machine.

I don't think I will be paying for it, I don't think the results are worth it.

#Programming #AI #LLM #GoLang

0xZogG,
@0xZogG@hachyderm.io avatar

@ainmosni: there is other solution where they have free tier - #codeium
In general as non frontend dev, I like how it suggests for html and for go even just minimal placeholders function fill-ins is nice.
But as per license and not knowing where my code is sent, I'm looking for selfhosted solution. Found few options with #ollama, but unfortunately my current 10 years old HW is not enough for that :P

ironicbadger, to selfhosted
@ironicbadger@techhub.social avatar

I hooked the Enchanted LLM app up to my #selfhosted #ollama instance tonight. Running on the epyc box with an nvidia a4000.

I can’t notice a difference in speed between this and the real chat-gpt tbh. And I own the whole chain locally. Man this is cool!!

I even shared the ollama api endpoint with a buddy over Tailscale and now they’re whipping the llamas 🦙 api as well. Super fun times.

https://apps.apple.com/us/app/enchanted-llm/id6474268307

DavidMarzalC, to Symfony Spanish
@DavidMarzalC@mastodon.escepticos.es avatar

Encaramos el final de marzo con otro nuevo episodio de "Accesibilidad con Tecnologías libres", para los amigues.

En https://accesibilidadtl.gitlab.io/05 tenéis las notas del programa con los temas tratados y los enlaces.

Si seguís a este alias @6706483 , tendréis automáticamente las publicaciones del podcast dentro del Fediverso

Y por si aún no lo tenéis el feed RSS para vuestras aplicaciones es:
https://accesibilidadtl.gitlab.io/feed

En este episodio 05 participan entre otras personas:

Esperamos que os resulte interesante.

kohelet, to llm
@kohelet@mstdn.social avatar

So,
I downloaded Ollama,
installed a local LLM,
installed the Continue VSCode extension, configured it with my local LLM.

Now I just need something to do with it!
Like, any project at all.
huh.

#ollama #llm #continue

5am, to LLMs
@5am@fosstodon.org avatar

I've been playing around with locally hosted using the tool. I've mostly been using models like mistral and dolphin-coder for assistance with textual ideas and issues. More recently I've been using the llava visual model via some simple , looping through images and creating description files. I can then grep those files for key words and note the associated filenames. Powerful stuff!

my_actual_brain, to llm

I’m surprised at the performance an can run at on my 8700-cpu.

It’s a bit slow and I don’t think it’s worth getting a gpu just to make it run at a faster speed, but maybe in time I’ll reconsider it.

If I were going to get a , what would you recommend?

orhun, to rust
@orhun@fosstodon.org avatar

Here is how you can use ChatGPT in your terminal - with an interface! 🔥

🦾 tenere: TUI for LLMs written in Rust.

🚀 Supports ChatGPT, llama.cpp & ollama

🦀 Built with @ratatui_rs

⭐ GitHub: https://github.com/pythops/tenere

#rustlang #tui #ratatui #chatgpt #terminal #llm #ollama #llama

video/mp4

ironicbadger, to NixOS
@ironicbadger@techhub.social avatar

A fully self-hosted, totally offline, and local AI using #nixos #tailscale and open-webui as a front-end for #ollama

gael, (edited ) to llm French
@gael@mastodon.social avatar

Imagine a running locally on a smartphone...

It could be used to fuel an offline assistant that would be able to easily add an appointment to your calendar, open an app, etc. without issues.

This has come to reality with this proof of concept using the Phi-2 2.8B transformer model running on /e/OS.

It is slow, so not very usable until we have dedicated chips on SoCs, but works (and !)

🙏 Stypox

on 👇 👇 👇

owzim,

@gael I'd say before that happens, running an LLM on your local network is your best bet. Projects like https://ollama.com/ make that incredibly easy. #llm #ollama

dliden, to emacs

Has anyone here worked much with generators in ?

I am looking for a good solution for streaming outputs in my ollama-elisp-sdk project. I think there's a good angle using generators to make a workflow fairly similar to e.g. the OpenAI API. Not sure yet though.

joe, to random

This past month, I was talking about how I spent $528 to buy a machine with enough guts to run more demanding AI models in Ollama. That is good and all but if you are not on that machine (or at least on the same network), it has limited utility. So, how do you use it if you are at a library or a friend’s house? I just discovered Tailscale. You install the Tailscale app on the server and all of your client devices and it creates an encrypted VPN connection between them. Each device on your “tailnet” has 4 addresses you can use to reference it:

  • Machine name: my-machine
  • FQDN: my-machine.tailnet.ts.net
  • IPv4: 100.X.Y.Z
  • IPv6: fd7a:115c:a1e0::53

If you remember Hamachi from back in the day, it is kind of the spiritual successor to that.

https://i0.wp.com/jws.news/wp-content/uploads/2024/03/Screenshot-2024-03-04-at-2.37.06%E2%80%AFPM.png?resize=1024%2C592&ssl=1

There is no need to poke holes in your firewall or expose your Ollama install to the public internet. There is even a client for iOS, so you can run it on your iPad. I am looking forward to playing around with it some more.

https://jws.news/2024/joe-discovered-tailscale/

#Hamachi #Networking #Ollama #Tailscale

sesivany, (edited ) to ai
@sesivany@floss.social avatar

#Ollama is the easiest way to run local #AI I've tried so far. In 5 minutes you can have a chatbot running on a local model. Dozens of models and UIs to choose from.
Just the speed is not great, but what can I expect on an Intel-only laptop.

bmp, to random
@bmp@mastodon.sdf.org avatar

Completely forgot I had made this database a while ago when I was bored: https://codeberg.org/bmp/flock, it is written in Go, and was generated with if I remember correctly. Maybe I'll pick it up again, given that newer models seem to be better.

davidbisset, to apple
@davidbisset@phpc.social avatar

NotesOllama uses #Ollama to talk to local LLMs in #Apple Notes on #macos Inspired by Obsidian Ollama. #AI

https://smallest.app/notesollama/

Not sure if i trust this if i don't see the code?

joe, to llm

Back in December, I paid $1,425 to replace my MacBook Pro to make my LLM research at all possible. That had an M1Pro CPU and 32GB of RAM, which (as I said previously) is kind of a bare minimum spec to run a useful local AI. I quickly wished I had enough RAM to run a 70B model, but you can’t upgrade Apple products after the fact and a 70B model needs 64GB of RAM. That led me to start looking for a second-hand Linux desktop that can handle a 70B model.

I ended up finding a 4yr-old HP Z4 G4 workstation with a Xeon® W-2125 Processor,128 GB of DDR4 2666 MHz RAM, a 512GB SAMSUNG nVme SSD, and a NVIDIA Quadro P4000 GPU with 8GB of GDDR5 GPU Memory. I bought it before Ollama released their Windows preview, so I planned to throw the latest Ubuntu LTS on it.

Going into this experiment, I was expecting that Ollama would thrash the GPU and the RAM but would use the CPU sparingly. I was not correct.

This is what the activity monitor looked like when I asked various models to tell me about themselves:

Mixtral

An ubuntu activity monitor while running mixtral

Llama2:70b

An ubuntu activity monitor while running Llama2:70b

Llama2:7b

An ubuntu activity monitor while running llama2:7b

Codellama

An ubuntu activity monitor while running codellama

The Xeon W-2125 has 8 threads and 4 cores, so I think that CPU1-CPU8 are threads. My theory going into this was that the models would go into memory and then the GPU would do all of the processing. The CPU would only be needed to serve the results back to the user. It looks like the full load is going to the CPU. For a moment, I thought that the 8 GB of video RAM was the limitation. That is why I tried running a 7b model for one of the tests. I am still not convinced that Ollama is even trying to use the GPU.

A screenshot of the "additional drivers" screen in ubuntu

I am using a proprietary Nvidia driver for the GPU but maybe I’m missing something?

I was recently playing around with Stability AI’s Stability Cascade. I might need to run those tests on this machine to see what the result is. It may be an Ollama-specific issue.

Have any questions, comments, or concerns? Please feel free to drop a comment, below. As a blanket warning, all of these posts are personal opinions and do not reflect the views or ethics of my employer. All of this research is being done off-hours and on my own dime.

https://jws.news/2024/hp-z4-g4-workstation/

ironicbadger, to selfhosted
@ironicbadger@techhub.social avatar

Did someone say self-hosted LLMs? #selfhosted #ollama

joe, to ubuntu

I am unsure how to fix "Unable to change power state from D3hot to DO, device inaccessible". I would have expected that installing desktop #ubuntu would have gotten easier over the years.

joe,

Well, f***. I thought that running a 70B AI model on a machine with 128 gigabytes of RAM would tax the RAM, not the CPU. Apparently that Xeon processor is the bottleneck.

I should check to make sure that the GPU is in use.

#Ollama #AI

The resource monitor in Ubuntu 22 immediately after running code llama 70B through Ollama

FreakyFwoof, to random

A long long time ago, @arfy made a lua script for Dolphin screen-readers that allowed you to type in plus or minus number of days and get the date. I just asked Dolphin Mixtral to do the same as an apple script using running locally and it actually did it. It runs and works just as I wanted. Madness.

set numDays to text returned of (display dialog "Enter the number of days:" default answer "")
set targetDate to current date
set newDate to targetDate + numDays * days
display dialog "The future date will be: " & (newDate as string)

joe, to ai

Back in December, I started exploring how all of this AI stuff works. Last week’s post was about the basics of how to run your AI. This week, I wanted to cover some frequently asked questions.

What is a Rule-Based Inference Engine?

A rule-based inference engine is designed to apply predefined rules to a given set of facts or inputs to derive conclusions or make decisions. It operates by using logical rules, which are typically expressed in an “if-then” format. You can think of it as basically a very complex version of the spell check in your text editor.

What is an AI Model?

AI models employ learning algorithms that draw conclusions or predictions from past data. An AI model’s data can come from various sources such as labeled data for supervised learning, unlabeled data for unsupervised learning, or data generated through interaction with an environment for reinforcement learning. The algorithm is the step-by-step procedure or set of rules that the model follows to analyze data and make predictions. Different algorithms have different strengths and weaknesses, and some are better suited for certain types of problems than others. A model has parameters that are the aspects of the model that are learned from the training data. A model’s complexity can be measured by the number of parameters contained in it but complexity can also depend on the architecture of the model (how the parameters interact with each other) and the types of parameters used.

What is an AI client?

An AI client is how the user interfaces with the rule-based inference engine. Since you can use the engine directly, the engine itself could also be the client. For the most part, you are going to want something web-based or a graphical desktop client, though. Good examples of graphical desktop clients would be MindMac or Ollamac. A good example of a web-based client would be Ollama Web UI. A good example of an application that is both a client and a rule-based inference engine is LM Studio. Most engines have APIs and language-specific libraries, so if you want to you can even write your own client.

What is the best client to use with a Rule-Based Inference Engine?

I like MindMac. I would recommend either that or Ollama Web UI. You can even host both Ollama and Ollama Web UI together using docker compose.

What is the best Rule-Based Inference Engine?

I have tried Ollama, Llama.cpp, and LM Studio. If you are using Windows, I would recommend LM Studio. If you are using Linux or a Mac, I would recommend Ollama.

How much RAM does your computer need to run a Rule-Based Inference Engine?

The RAM requirement is dependent upon what model you are using. If you browse the Ollama library, Hugging Face, or LM Studio‘s listing of models, most listings will list a RAM requirement (example) based on the number of parameters in the model. Most 7b models can run on a minimum of 8GB of RAM while most 70b models will require 64GB of RAM. My Macbook Pro has 32GB of unified memory and struggles to run Wizard-Vicuna-Uncensored 30b. My new AI lab currently has 128GB of DDR4 RAM and I hope that it can run 70b models reliably.

Does your computer need a dedicated GPU to run a Rule-Based Inference Engine?

No, you don’t. You can use just the CPU but if you have an Nvidia GPU, it helps a lot.

I use Digital Ocean or Linode for hosting my website. Can I host my AI there, also?

Yeah, you can. The RAM requirement would make it a bit expensive, though. A virtual machine with 8GB of RAM is almost $50/mo.

Why wouldn’t you use ChatGPT, Copilot, or Bard?

When you use any of them, your interactions are used to reinforce the training of the model. That is an issue for more than the most basic prompts. In addition to that, they cost up to $30/month/user.

Why should you use an open-source LLM?

What opinion does your employer have of this research project?

You would need to direct that question to them. All of these posts should be considered personal opinions and do not reflect the views or ethics of my employer. All of this research is being done off-hours and on my own dime.

Why are you interested in this technology?

It is a new technology that I didn’t consider wasteful bullshit in the first hour of researching it.

Are you afraid that AI will take your job?

No.

What about image generation?

I used (and liked) Noiselith until it shut down. DiffusionBee works but I think that Diffusers might be the better solution. Diffusers lets you use multiple models and it is easier to use than Stable Diffusion Web UI.

You advocate for not using ChatGPT. Do you use it?

I do. ChatGPT 4 is a 1.74t model. It can do cool things. I have an API key and I use it via MindMac. Using it that way means that I pay based on how much I use it instead of using it via a Pro account, though.

Are you going to only write about AI on here, now?

Nope. I still have other interests. Expect more Vue.js posts and likely something to do with Unity or Unreal at some point.

Is this going to be the last AI FAQ post?

Nope. I still haven’t covered training or fine-tuning.

https://jws.news/2024/ai-frequently-asked-questions/

#AI #ChatGPT #Docker #LLM #LMStudio #MindMac #Ollama #Ollamac

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • InstantRegret
  • mdbf
  • osvaldo12
  • magazineikmin
  • cubers
  • rosin
  • thenastyranch
  • Youngstown
  • tacticalgear
  • slotface
  • khanakhh
  • kavyap
  • DreamBathrooms
  • provamag3
  • Durango
  • everett
  • ethstaker
  • modclub
  • anitta
  • cisconetworking
  • tester
  • ngwrru68w68
  • GTA5RPClips
  • normalnudes
  • megavids
  • Leos
  • lostlight
  • All magazines