orhun, to rust
@orhun@fosstodon.org avatar

Here is how you can use ChatGPT in your terminal - with an interface! 🔥

🦾 tenere: TUI for LLMs written in Rust.

🚀 Supports ChatGPT, llama.cpp & ollama

🦀 Built with @ratatui_rs

⭐ GitHub: https://github.com/pythops/tenere

video/mp4

Mrw, to homelab
@Mrw@hachyderm.io avatar

Saturday.
The goal today was to get deployed on the cluster. That’s a fun way to run your own models on whatever accelerators you have handy. It’ll run on your CPU, sure, but man is it slow.

Nvidia now ships a GPU operator, which handles annotating nodes and managing the resource type. “All you need to do” — the most dangerous phrase in computers — is smuggle the GPUs through whatever virtualization you’re doing, and expose them to containerd properly.

But I got there! Yay.

greg, to homeassistant
@greg@clar.ke avatar

W̶a̶k̶e̶ ̶o̶n̶ ̶L̶A̶N̶ Wake-on-Zigbee

Maintaining Wake-on-LAN on a dual-boot Windows 10 / Ubuntu 22.04LTS system is a hassle. So I went with a simple Fingerbot solution. Now I have Wake-on-Zigbee!

By default the system boots into Ubuntu which hosts an Ollama server and does some video compression jobs (I wanted to be able to start those remotely). I only use Windows for VR gaming when I'm physically in the room and therefore can select the correct partition at boot.

Using Zigbee Fingerbot to turn on PC

obrhoff, to llm
@obrhoff@chaos.social avatar

The amazing thing about LLMs is how much knowledge they posess in their small size. The llama3-8b model, for instance, weighs only 4.7GB yet can still answer your questions about everything (despite some hallucinations).

FreakyFwoof, to random

A long long time ago, @arfy made a lua script for Dolphin screen-readers that allowed you to type in plus or minus number of days and get the date. I just asked Dolphin Mixtral to do the same as an apple script using #Ollama running locally and it actually did it. It runs and works just as I wanted. Madness.

set numDays to text returned of (display dialog "Enter the number of days:" default answer "")
set targetDate to current date
set newDate to targetDate + numDays * days
display dialog "The future date will be: " & (newDate as string)

joe, to llm

Back in December, I paid $1,425 to replace my MacBook Pro to make my LLM research at all possible. That had an M1Pro CPU and 32GB of RAM, which (as I said previously) is kind of a bare minimum spec to run a useful local AI. I quickly wished I had enough RAM to run a 70B model, but you can’t upgrade Apple products after the fact and a 70B model needs 64GB of RAM. That led me to start looking for a second-hand Linux desktop that can handle a 70B model.

I ended up finding a 4yr-old HP Z4 G4 workstation with a Xeon® W-2125 Processor,128 GB of DDR4 2666 MHz RAM, a 512GB SAMSUNG nVme SSD, and a NVIDIA Quadro P4000 GPU with 8GB of GDDR5 GPU Memory. I bought it before Ollama released their Windows preview, so I planned to throw the latest Ubuntu LTS on it.

Going into this experiment, I was expecting that Ollama would thrash the GPU and the RAM but would use the CPU sparingly. I was not correct.

This is what the activity monitor looked like when I asked various models to tell me about themselves:

Mixtral

An ubuntu activity monitor while running mixtral

Llama2:70b

An ubuntu activity monitor while running Llama2:70b

Llama2:7b

An ubuntu activity monitor while running llama2:7b

Codellama

An ubuntu activity monitor while running codellama

The Xeon W-2125 has 8 threads and 4 cores, so I think that CPU1-CPU8 are threads. My theory going into this was that the models would go into memory and then the GPU would do all of the processing. The CPU would only be needed to serve the results back to the user. It looks like the full load is going to the CPU. For a moment, I thought that the 8 GB of video RAM was the limitation. That is why I tried running a 7b model for one of the tests. I am still not convinced that Ollama is even trying to use the GPU.

A screenshot of the "additional drivers" screen in ubuntu

I am using a proprietary Nvidia driver for the GPU but maybe I’m missing something?

I was recently playing around with Stability AI’s Stability Cascade. I might need to run those tests on this machine to see what the result is. It may be an Ollama-specific issue.

Have any questions, comments, or concerns? Please feel free to drop a comment, below. As a blanket warning, all of these posts are personal opinions and do not reflect the views or ethics of my employer. All of this research is being done off-hours and on my own dime.

https://jws.news/2024/hp-z4-g4-workstation/

bmp, to random
@bmp@mastodon.sdf.org avatar

Completely forgot I had made this database a while ago when I was bored: https://codeberg.org/bmp/flock, it is written in Go, and was generated with if I remember correctly. Maybe I'll pick it up again, given that newer models seem to be better.

kevinctofel, to til
@kevinctofel@hachyderm.io avatar

🆕 blog post: May 10, 2024 - #TIL

Personalizing my local #AI, replacing paper towels, and the world's longest (unofficial) ski jump of 291 meters.
#Obsidian #Ollama #LLM

https://myconscious.stream/blog/May-10-2024-TIL

DavidMarzalC, to Symfony Spanish
@DavidMarzalC@mastodon.escepticos.es avatar

Encaramos el final de marzo con otro nuevo episodio de "Accesibilidad con Tecnologías libres", para los amigues.

En https://accesibilidadtl.gitlab.io/05 tenéis las notas del programa con los temas tratados y los enlaces.

Si seguís a este alias @6706483 , tendréis automáticamente las publicaciones del podcast dentro del Fediverso

Y por si aún no lo tenéis el feed RSS para vuestras aplicaciones es:
https://accesibilidadtl.gitlab.io/feed

En este episodio 05 participan entre otras personas:

Esperamos que os resulte interesante.

ramikrispin, to datascience
@ramikrispin@mstdn.social avatar

New release to Ollama 🎉

A major release to Ollama - version 0.1.32 is out. The new version includes:
✅ Improvement of the GPU utilization and memory management to increase performance and reduce error rate
✅ Increase performance on Mac by scheduling large models between GPU and CPU
✅ Introduce native AI support in Supabase edge functions

More details on the release notes 👇🏼
https://github.com/ollama/ollama/releases

Image credit: release notes

ramikrispin, to llm
@ramikrispin@mstdn.social avatar

In case you are wondering, the new Microsoft mini LLM - phi3, can handle code generation, in this case, SQL.

I compared the runtime (locally on CPU) with respect to codellama:7B using Ollama, and surprisingly the Phi3 runtime was significantly slower.

davep, to python
@davep@fosstodon.org avatar

Going on stream to tinker some more with an Ollama client I’m building for myself: https://www.youtube.com/watch?v=LzHUdfR4PRg

pjk, to python
@pjk@www.peterkrupa.lol avatar

One thing you notice right away about LLMs is they bear a striking resemblance to that ubiquitous internet character, the reply-guy: they always have an answer, they are brimming with confidence, and they are often completely wrong.

It seemed only natural, then, to turn one into a full-time online version of itself, RGaaS (Reply-Guy as a Service), always on, always ready to reply to anything when summoned, with an “Actually,” followed by his (it’s always a “he”) opinion on what you should be thinking.

And so I built the Actually Bot.

https://www.peterkrupa.lol/wp-content/uploads/2024/05/actually_bot1.pngBasically, the way it works is that any user can reply to any Mastodon post (or I guess any post on the ActivityPub protocol) and tag the Actually Bot (@actuallybot) and it will reply with its most mediocre take on why everyone else is wrong, or at least not quite right.

The reply-guys can all move on to something else now, I have automated them out of a job.

This was a really fun project and has a lot of separate parts. First, I had to modify an LLM to give it the right “personality” for this job. To do this, I used Ollama, a tool for running open source LLMs locally, on your own machine. I used the “customized prompt feature”: You basically write a little file that Ollama uses to tweak the LLM of your choosing to give it certain characteristics. I went with the new Llama3:8b model from Meta as my base model, then put the following instructions in my modelfile:

FROM llama3PARAMETER temperature 3SYSTEM """You are receiving social media posts as prompts, and your job is to reply to these prompts. Please start all your replies with the word "actually". Reply as if you know more than everyone. You must recontextualize every prompt and correct or amend it, informing the prompter that they are wrong, even if ever so slightly. You write in short, sentences. Your replies must be short. Keep them to about three sentences. This is essential: keep your replies short. They must be under 500 characters."""

Then I ran the following command in the console:

ollama create actually_llama -f ./actually_llama

… and my model was ready to roll. Next, I needed a program to connect to the Ollama API to send the LLM prompts and get responses. Python was great for that, as both Ollama and Mastodon have solid Python libraries. Probably the slowest part was picking through Mastodon.py to figure out how the methods work and what exactly they return. It’s a very robust library with a million options, and fortunately it’s also extremely well documented, so while it was slow going, I was able to whack it together without too much trouble.

I’m not going to get into all the code here, but basically, I wrote a simple method that checks mentions, grabs the text of a post and the post it is replying to, and returns them for feeding into the LLM as the prompt.

Despite my very careful, detailed, and repetitive instructions to be sure replies are no more than 500 characters, LLMs can’t count, and they are very verbose, so I had to add a cleanup method that cuts the reply down to under 500 characters. Then I wrote another method for sending that cleaned-up prompt to Ollama and returning the response.

The main body starts off by getting input for the username and password for login, then it launches a while True loop that calls my two functions, checking every 60 seconds to see if there are any mentions and replying to them if there are.

OK it works! Now came the hard part, which was figuring out how to get to 100% uptime. If I want the Actually Bot to reply every time someone mentions it, I need it to be on a machine that is always on, and I was not going to leave my PC on for this (nor did I want it clobbering my GPU when I was in the middle of a game).

So my solution was this little guy:

https://www.peterkrupa.lol/wp-content/uploads/2024/05/lenovo.jpg… a Lenovo ThinkPad with a 3.3GHz quad-core i7 and 8gb of RAM. We got this refurbished machine when the pandemic was just getting going and it was my son’s constant companion for 18 months. It’s nice to be able to put it to work again. I put Ubuntu Linux on it and connected it to the home LAN.

I actually wasn’t even sure it would be able to run Llama3:8b. My workstation has an Nvidia GPU with 12gb of VRAM and it works fine for running modest LLMs locally, but this little laptop is older and not built for gaming and I wasn’t sure how it would handle such a heavy workload.

Fortunately, it worked with no problems. For running a chatbot, waiting 2 minutes for a reply is unacceptable, but for a bot that posts to social media, it’s well within range of what I was shooting for, and it didn’t seem to have any performance issues as far as the quality of the responses either.

The last thing I had to figure out was how to actually run everything from the Lenovo. I suppose I could have copied the Python files and tried to recreate the virtual environment locally, but I hate messing with virtual environments and dependencies, so I turned to the thing everyone says you should use in this situation: Docker.

This was actually great because I’d been wanting to learn how to use Docker for awhile but never had the need. I’d installed it earlier and used it to run the WebUI front end for Ollama, so I had a little bit of an idea how it worked, but the Actually Bot really made me get into its working parts.

So, I wrote a Docker file for my Python app, grabbed all the dependencies and plopped them into a requirements.txt file, and built the Docker image. Then I scr’d the image over to the Lenovo, spun up the container, and boom! The Actually Bot was running!

Well, OK, it wasn’t that simple. I basically had to learn all this stuff from scratch, including the console commands. And once I had the Docker container running, my app couldn’t connect to Ollama because it turns out, because Ollama is a server, I had to launch the container with a flag indicating that it shared the host’s network settings.

Then once I had the Actually Bot running, it kept crashing when people tagged it in a post that wasn’t a reply to another post. So, went back to the code, squashed bug, redeploy container, bug still there because I didn’t redeploy the container correctly. There was some rm, some prune, some struggling with the difference between “import” and “load” and eventually I got everything working.

Currently, the Actually Bot is sitting on two days of uninterrupted uptime with ~70 successful “Actually,” replies, and its little laptop home isn’t even on fire or anything!

Moving forward, I’m going to tweak a few things so I can get better logging and stats on what it’s actually doing so I don’t have to check its posting history on Mastodon. I just realized you can get all the output that a Python script running in a Docker container prints with the command docker logs [CONTAINER], so that’s cool.

The other thing I’d like to do is build more bots. I’m thinking about spinning up my own Mastodon instance on a cheap hosting space and loading it with all kinds of bots talking to each other. See what transpires. If Dead Internet Theory is real, we might as well have fun with it!

https://www.peterkrupa.lol/2024/05/01/actually-building-a-bot-is-fun/

image/jpeg

davep, to python
@davep@fosstodon.org avatar

Doing some pre-dinner #Python coding, working some more on my toy #Textual #Ollama client for the #terminal.

https://www.youtube.com/watch?v=7dTwJQn_Ggw

elevenhsoft, to random Polish
@elevenhsoft@mastodon.social avatar

we can now attach images to #COSMIC applet for #ollama :)

#llava doing it just fine ^^

#popos #cosmicdesktop

elevenhsoft, to System76 Polish
@elevenhsoft@mastodon.social avatar

Hello friends! New #COSMICdesktop applet is coming soon....

This time I'm working on Ollama applet for our lovely #COSMIC :)

#popos #system76 #linux #ollama

joe, to random

This past month, I was talking about how I spent $528 to buy a machine with enough guts to run more demanding AI models in Ollama. That is good and all but if you are not on that machine (or at least on the same network), it has limited utility. So, how do you use it if you are at a library or a friend’s house? I just discovered Tailscale. You install the Tailscale app on the server and all of your client devices and it creates an encrypted VPN connection between them. Each device on your “tailnet” has 4 addresses you can use to reference it:

  • Machine name: my-machine
  • FQDN: my-machine.tailnet.ts.net
  • IPv4: 100.X.Y.Z
  • IPv6: fd7a:115c:a1e0::53

If you remember Hamachi from back in the day, it is kind of the spiritual successor to that.

https://i0.wp.com/jws.news/wp-content/uploads/2024/03/Screenshot-2024-03-04-at-2.37.06%E2%80%AFPM.png?resize=1024%2C592&ssl=1

There is no need to poke holes in your firewall or expose your Ollama install to the public internet. There is even a client for iOS, so you can run it on your iPad. I am looking forward to playing around with it some more.

https://jws.news/2024/joe-discovered-tailscale/

#Hamachi #Networking #Ollama #Tailscale

kevinctofel, to ai
@kevinctofel@hachyderm.io avatar

I had been meaning to look at using #Ollama for local #AI & today I caught this excellent video how-to. Still not sold on the current hype of AI but it would be foolish to overlook its place in our future.

Honestly, this little project was the most fun I’ve had in a while: installing/using different AI models, using AI in the #terminal, then in a #browser, even creating a persona for the prompts. And it’s all running locally (no internet needed!), so it’s #private. 🤓

https://youtube.com/watch?v=Wjrdr0NU4Sk&si=x5I0kG2iwZGefc7d

joe, to ai

A few weeks back, I thought about getting an AI model to return the “Flavor of the Day” for a Culver’s location. If you ask Llama 3:70b “The website https://www.culvers.com/restaurants/glendale-wi-bayside-dr lists “today’s flavor of the day”. What is today’s flavor of the day?”, it doesn’t give a helpful answer.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-09-at-12.29.28%E2%80%AFPM.png?resize=1024%2C690&ssl=1

If you ask ChatGPT 4 the same question, it gives an even less useful answer.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-09-at-12.33.42%E2%80%AFPM.png?resize=1024%2C782&ssl=1

If you check the website, today’s flavor of the day is Chocolate Caramel Twist.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-09-at-12.41.21%E2%80%AFPM.png?resize=1024%2C657&ssl=1

So, how can we get a proper answer? Ten years ago, when I wrote “The Milwaukee Soup App”, I used the Kimono (which is long dead) to scrape the soup of the day. You could also write a fiddly script to scrape the value manually. It turns out that there is another option, though. You could use Scrapegraph-ai. ScrapeGraphAI is a web scraping Python library that uses LLM and direct graph logic to create scraping pipelines for websites, documents, and XML files. Just say which information you want to extract and the library will do it for you.

Let’s take a look at an example. The project has an official demo where you need to provide an OpenAI API key, select a model, provide a link to scrape, and write a prompt.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-09-at-12.35.29%E2%80%AFPM.png?resize=1024%2C660&ssl=1

As you can see, it reliably gives you the flavor of the day (in a nice JSON object). It will go even further, though because if you point it at the monthly calendar, you can ask it for the flavor of the day and soup of the day for the remainder of the month and it can do that as well.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-09-at-1.14.43%E2%80%AFPM.png?resize=1024%2C851&ssl=1

Running it locally with Llama 3 and Nomic

I am running Python 3.12 on my Mac but when you run pip install scrapegraphai to install the dependencies, it throws an error. The project lists the prerequisite of Python 3.8+, so I downloaded 3.9 and installed the library into a new virtual environment.

Let’s see what the code looks like.

You will notice that just like in yesterday’s How to build a RAG system post, we are using both a main model and an embedding model.

So, what does the output look like?

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-09-at-2.28.10%E2%80%AFPM.png?resize=1024%2C800&ssl=1

At this point, if you want to harvest flavors of the day for each location, you can do so pretty simply. You just need to loop through each of Culver’s location websites.

Have a question, comment, etc? Please feel free to drop a comment, below.

https://jws.news/2024/how-to-use-ai-to-make-web-scraping-easier/

codewiz, to rust
@codewiz@mstdn.io avatar

Got Coder 33B running on my desktop's card with .

First off, I tested its ability to generate and understand code. Unfortunately, it falls into the same confusion of the smaller 6.7B model.

https://gist.github.com/codewiz/c6bd627ec38c9bc0f615f4a32da0490e

codewiz,
@codewiz@mstdn.io avatar

Today I tried running Codestral, a 22B parameter LLM tuned for coding by Mistral AI.

With my Rust mock interview questions, it performed better than all other offline models I tried so far.

https://paste.benpro.fr/?4eb8f2e15841672d#DGnLh3dCp7UdzvWoJgev58EPmre19ij31KSbbq8c85Gm

joe, to ai

LLaVA (Large Language-and-Vision Assistant) was updated to version 1.6 in February. I figured it was time to look at how to use it to describe an image in Node.js. LLaVA 1.6 is an advanced vision-language model created for multi-modal tasks, seamlessly integrating visual and textual data. Last month, we looked at how to use the official Ollama JavaScript Library. We are going to use the same library, today.

Basic CLI Example

Let’s start with a CLI app. For this example, I am using my remote Ollama server but if you don’t have one of those, you will want to install Ollama locally and replace const ollama = new Ollama({ host: 'http://100.74.30.25:11434' }); with const ollama = new Ollama({ host: 'http://localhost:11434' });.

To run it, first run npm i ollama and make sure that you have "type": "module" in your package.json. You can run it from the terminal by running node app.js <image filename>. Let’s take a look at the result.

Its ability to describe an image is pretty awesome.

Basic Web Service

So, what if we wanted to run it as a web service? Running Ollama locally is cool and all but it’s cooler if we can integrate it into an app. If you npm install express to install Express, you can run this as a web service.

The web service takes posts to http://localhost:4040/describe-image with a binary body that contains the image that you are trying to get a description of. It then returns a JSON object containing the description.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-18-at-1.41.20%E2%80%AFPM.png?resize=1024%2C729&ssl=1

Have any questions, comments, etc? Feel free to drop a comment, below.

https://jws.news/2024/how-can-you-use-llava-and-node-js-to-describe-an-image/

#AI #JavaScript #LLaVA #LLM #NodeJs #Ollama

kevinctofel, to ai
@kevinctofel@hachyderm.io avatar

Interesting local / in-progress project worth watching: Perplexica. Aims to be similar to but has a ways to go yet. Works with , which is what I’m using on to test local AI.

https://youtu.be/TkxmOC4HBSg?si=L9uCF9ePlT7Ccs6t

elevenhsoft, to System76 Polish
@elevenhsoft@mastodon.social avatar

Some progress on applet for

Now we can save and load full conversations. Also from now you can keep you messages context.

elevenhsoft, to System76 Polish
@elevenhsoft@mastodon.social avatar

Some updates on applet for :)

improved layouts/settings page so now i think everything is not looks like out of place haha

added load, save, remove conversations from history

also now it's possible to pull and remove models locally.

ahh, and beautiful button to stop current typing by bot, so if we don't like what he is saying, we can stop him while typing :)

image/png

joe, to ai

Previously, we looked at how to build a retrieval-augmented generation system using LangChain. As of last month, you can do the same thing with just the Ollama Python Library that we used in last month’s How to Write a Python App that uses Ollama. In today’s post, I want to use the Ollama Python Library, Chroma DB, and the JSON API for Kopp’s Frozen Custard to embed the flavor of the day for today and tomorrow.Let’s start with a very basic embedding example.

In the above example, we start by building an array of things that we want to embed, embed them using nomic-embed-text and Chroma DB, and then use llama3:8b for the main model.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-30-at-10.32.52%E2%80%AFPM.png?resize=1024%2C800&ssl=1

So, how do you get the live data for the flavors of the day? The API, of course!

This simple script gets the flavor of the day from a JSON API, builds an array of embedable strings, and prints the result.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-30-at-10.44.23%E2%80%AFPM.png?resize=1024%2C800&ssl=1

The next step is to combine the two scripts.

Two big differences that you will notice between the other two examples and this one is that the date no longer contains the year and I added a statement of what today’s date is, so that you can ask for “Today’s flavors”.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-30-at-10.56.59%E2%80%AFPM.png?resize=1024%2C800&ssl=1

If you have any questions on how this works, later on today I am hosting a live webinar on Crafting Intelligent Python Apps with Retrieval-Augmented Generation. Feel free to stop by and see how to build a RAG system.

https://jws.news/2024/how-to-get-ai-to-tell-you-the-flavor-of-the-day-at-kopps/

  • All
  • Subscribed
  • Moderated
  • Favorites
  • provamag3
  • thenastyranch
  • cisconetworking
  • mdbf
  • everett
  • Youngstown
  • rosin
  • slotface
  • love
  • khanakhh
  • kavyap
  • ngwrru68w68
  • ethstaker
  • DreamBathrooms
  • megavids
  • magazineikmin
  • modclub
  • cubers
  • InstantRegret
  • osvaldo12
  • Durango
  • GTA5RPClips
  • tester
  • normalnudes
  • Leos
  • tacticalgear
  • anitta
  • JUstTest
  • All magazines