#ollama - kbin.social

kevinctofel, 6 hours ago to til

🆕 blog post: May 10, 2024 - #TIL

Personalizing my local #AI, replacing paper towels, and the world's longest (unofficial) ski jump of 291 meters.
#Obsidian #Ollama #LLM

https://myconscious.stream/blog/May-10-2024-TIL

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

joe, 16 hours ago to ai

A few weeks back, I thought about getting an AI model to return the “Flavor of the Day” for a Culver’s location. If you ask Llama 3:70b “The website https://www.culvers.com/restaurants/glendale-wi-bayside-dr lists “today’s flavor of the day”. What is today’s flavor of the day?”, it doesn’t give a helpful answer.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-09-at-12.29.28%E2%80%AFPM.png?resize=1024%2C690&ssl=1

If you ask ChatGPT 4 the same question, it gives an even less useful answer.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-09-at-12.33.42%E2%80%AFPM.png?resize=1024%2C782&ssl=1

If you check the website, today’s flavor of the day is Chocolate Caramel Twist.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-09-at-12.41.21%E2%80%AFPM.png?resize=1024%2C657&ssl=1

So, how can we get a proper answer? Ten years ago, when I wrote “The Milwaukee Soup App”, I used the Kimono (which is long dead) to scrape the soup of the day. You could also write a fiddly script to scrape the value manually. It turns out that there is another option, though. You could use Scrapegraph-ai. ScrapeGraphAI is a web scraping Python library that uses LLM and direct graph logic to create scraping pipelines for websites, documents, and XML files. Just say which information you want to extract and the library will do it for you.

Let’s take a look at an example. The project has an official demo where you need to provide an OpenAI API key, select a model, provide a link to scrape, and write a prompt.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-09-at-12.35.29%E2%80%AFPM.png?resize=1024%2C660&ssl=1

As you can see, it reliably gives you the flavor of the day (in a nice JSON object). It will go even further, though because if you point it at the monthly calendar, you can ask it for the flavor of the day and soup of the day for the remainder of the month and it can do that as well.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-09-at-1.14.43%E2%80%AFPM.png?resize=1024%2C851&ssl=1

Running it locally with Llama 3 and Nomic

I am running Python 3.12 on my Mac but when you run pip install scrapegraphai to install the dependencies, it throws an error. The project lists the prerequisite of Python 3.8+, so I downloaded 3.9 and installed the library into a new virtual environment.

Let’s see what the code looks like.

You will notice that just like in yesterday’s How to build a RAG system post, we are using both a main model and an embedding model.

So, what does the output look like?

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-09-at-2.28.10%E2%80%AFPM.png?resize=1024%2C800&ssl=1

At this point, if you want to harvest flavors of the day for each location, you can do so pretty simply. You just need to loop through each of Culver’s location websites.

Have a question, comment, etc? Please feel free to drop a comment, below.

https://jws.news/2024/how-to-use-ai-to-make-web-scraping-easier/

#AI #ChatGPT #llama3 #LLM #Ollama #Python #ScrapegraphAi #WebScraping

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

joe, 1 day ago (edited 1 day ago) to ai

Back in January, we started looking at AI and how to run a large language model (LLM) locally (instead of just using something like ChatGPT or Gemini). A tool like Ollama is great for building a system that uses AI without dependence on OpenAI. Today, we will look at creating a Retrieval-augmented generation (RAG) application, using Python, LangChain, Chroma DB, and Ollama. Retrieval-augmented generation is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response. If you have a source of truth that isn’t in the training data, it is a good way to get the model to know about it. Let’s get started!

Your RAG will need a model (like llama3 or mistral), an embedding model (like mxbai-embed-large), and a vector database. The vector database contains relevant documentation to help the model answer specific questions better. For this demo, our vector database is going to be Chroma DB. You will need to “chunk” the text you are feeding into the database. Let’s start there.

Chunking

There are many ways of choosing the right chunk size and overlap but for this demo, I am just going to use a chunk size of 7500 characters and an overlap of 100 characters. I am also going to use LangChain‘s CharacterTextSplitter to do the chunking. It means that the last 100 characters in the value will be duplicated in the next database record.

The Vector Database

A vector database is a type of database designed to store, manage, and manipulate vector embeddings. Vector embeddings are representations of data (such as text, images, or sounds) in a high-dimensional space, where each data item is represented as a dense vector of real numbers. When you query a vector database, your query is transformed into a vector of real numbers. The database then uses this vector to perform similarity searches.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-08-at-2.36.49%E2%80%AFPM.png?resize=665%2C560&ssl=1

You can think of it as being like a two-dimensional chart with points on it. One of those points is your query. The rest are your database records. What are the points that are closest to the query point?

Embedding Model

To do this, you can’t just use an Ollama model. You need to also use an embedding model. There are three that are available to pull from the Ollama library as of the writing of this. For this demo, we are going to be using nomic-embed-text.

Main Model

Our main model for this demo is going to be phi3. It is a 3.8B parameters model that was trained by Microsoft.

LangChain

You will notice that today’s demo is heavily using LangChain. LangChain is an open-source framework designed for developing applications that use LLMs. It provides tools and structures that enhance the customization, accuracy, and relevance of the outputs produced by these models. Developers can leverage LangChain to create new prompt chains or modify existing ones. LangChain pretty much has APIs for everything that we need to do in this app.

The Actual App

Before we start, you are going to want to pip install tiktoken langchain langchain-community langchain-core. You are also going to want to ollama pull phi3 and ollama pull nomic-embed-text. This is going to be a CLI app. You can run it from the terminal like python3 app.py "<Question Here>".

You also need a sources.txt file containing the URLs of things that you want to have in your vector database.

So, what is happening here? Our app.py file is reading sources.txt to get a list of URLs for news stories from Tuesday’s Apple event. It then uses WebBaseLoader to download the pages behind those URLs, uses CharacterTextSplitter to chunk the data, and creates the vectorstore using Chroma. It then creates and invokes rag_chain.

Here is what the output looks like:

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-08-at-4.09.36%E2%80%AFPM.png?resize=1024%2C845&ssl=1

The May 7th event is too recent to be in the model’s training data. This makes sure that the model knows about it. You could also feed the model company policy documents, the rules to a board game, or your diary and it will magically know that information. Since you are running the model in Ollama, there is no risk of that information getting out, too. It is pretty awesome.

Have any questions, comments, etc? Feel free to drop a comment, below.

https://jws.news/2024/how-to-build-a-rag-system-using-python-ollama-langchain-and-chroma-db/

#AI #ChromaDB #Chunking #LangChain #LLM #Ollama #Python #RAG

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

governa, 2 days ago to ai

Running #AI Locally Using #Ollama on #Ubuntu #Linux

https://itsfoss.com/ollama-setup-linux/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

kevinctofel, 4 days ago to ai

I had been meaning to look at using #Ollama for local #AI & today I caught this excellent video how-to. Still not sold on the current hype of AI but it would be foolish to overlook its place in our future.

Honestly, this little project was the most fun I’ve had in a while: installing/using different AI models, using AI in the #terminal, then in a #browser, even creating a persona for the prompts. And it’s all running locally (no internet needed!), so it’s #private. 🤓

https://youtube.com/watch?v=Wjrdr0NU4Sk&si=x5I0kG2iwZGefc7d

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ kevinctofel

mjaschen, 4 days ago to php German

Wochenrückblick, Ausgabe 39 (2024-18).

Diesmal mit

🗺️ der Bikerouter Hall of Fame aller Supporterinnen bisher ❤️

🚵‍♂️ neuen Reifen für das Crosser-Gravel-Dings und der ersten Ausfahrt damit – unter Anderem zu Deutschlands größter (!) Wüste (!!)

💻 den Erkenntnissen von @atomicpoet zur Ergonomie beim Arbeiten mit dem Notebook vs. Desktop-Computer und wie sich das mit meinen Erfahrungen deckt

🤖 ollama, welches die Möglichkeit bietet, LLMs bequem auf dem eigenen Rechner laufen zu lassen

🐘 cachetool, einem Werkzeug zum Verwalten des PHP opcache

🍎 dem Blick in mein macOS-Applications-Directory: diesmal gibt's den Blick auf alle Apps, die mit „F“ beginnen

🛠️ noch mal Deployment des Blogs, das läuft jetzt tatsächlich mit einem simplen post-merge Hook in Git

🔊 und wie immer Techno

#BikeRouter #Gravel #CycloCross #Spreewald #DahmeSpreewald #Wüste #Ergonomie #Notebook #Desktop #ollama #cachetool #PHP #opcache #macOS #Git #Techno

https://www.marcusjaschen.de/blog/2024/2024-18/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

frankel, 6 days ago to random

Boost your #API #mocking workflow with #Ollama and #Microcks

https://itnext.io/boost-your-api-mocking-workflow-with-ollama-and-microcks-38e25fe78450

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

davep, 8 days ago to python

Doing some pre-dinner #Python coding, working some more on my toy #Textual #Ollama client for the #terminal.

https://www.youtube.com/watch?v=7dTwJQn_Ggw

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ willmcgugan

obrhoff, 9 days ago to llm

The amazing thing about LLMs is how much knowledge they posess in their small size. The llama3-8b model, for instance, weighs only 4.7GB yet can still answer your questions about everything (despite some hallucinations).
#llm #ai #ollama #llama3

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ ErikJonker

pjk, 9 days ago to python
One thing you notice right away about LLMs is they bear a striking resemblance to that ubiquitous internet character, the reply-guy: they always have an answer, they are brimming with confidence, and they are often completely wrong.

It seemed only natural, then, to turn one into a full-time online version of itself, RGaaS (Reply-Guy as a Service), always on, always ready to reply to anything when summoned, with an “Actually,” followed by his (it’s always a “he”) opinion on what you should be thinking.

And so I built the Actually Bot.

https://www.peterkrupa.lol/wp-content/uploads/2024/05/actually_bot1.pngBasically, the way it works is that any user can reply to any Mastodon post (or I guess any post on the ActivityPub protocol) and tag the Actually Bot (@actuallybot) and it will reply with its most mediocre take on why everyone else is wrong, or at least not quite right.

The reply-guys can all move on to something else now, I have automated them out of a job.

This was a really fun project and has a lot of separate parts. First, I had to modify an LLM to give it the right “personality” for this job. To do this, I used Ollama, a tool for running open source LLMs locally, on your own machine. I used the “customized prompt feature”: You basically write a little file that Ollama uses to tweak the LLM of your choosing to give it certain characteristics. I went with the new Llama3:8b model from Meta as my base model, then put the following instructions in my modelfile:
FROM llama3PARAMETER temperature 3SYSTEM """You are receiving social media posts as prompts, and your job is to reply to these prompts. Please start all your replies with the word "actually". Reply as if you know more than everyone. You must recontextualize every prompt and correct or amend it, informing the prompter that they are wrong, even if ever so slightly. You write in short, sentences. Your replies must be short. Keep them to about three sentences. This is essential: keep your replies short. They must be under 500 characters."""
Then I ran the following command in the console:
ollama create actually_llama -f ./actually_llama
… and my model was ready to roll. Next, I needed a program to connect to the Ollama API to send the LLM prompts and get responses. Python was great for that, as both Ollama and Mastodon have solid Python libraries. Probably the slowest part was picking through Mastodon.py to figure out how the methods work and what exactly they return. It’s a very robust library with a million options, and fortunately it’s also extremely well documented, so while it was slow going, I was able to whack it together without too much trouble.

I’m not going to get into all the code here, but basically, I wrote a simple method that checks mentions, grabs the text of a post and the post it is replying to, and returns them for feeding into the LLM as the prompt.

Despite my very careful, detailed, and repetitive instructions to be sure replies are no more than 500 characters, LLMs can’t count, and they are very verbose, so I had to add a cleanup method that cuts the reply down to under 500 characters. Then I wrote another method for sending that cleaned-up prompt to Ollama and returning the response.

The main body starts off by getting input for the username and password for login, then it launches a while True loop that calls my two functions, checking every 60 seconds to see if there are any mentions and replying to them if there are.

OK it works! Now came the hard part, which was figuring out how to get to 100% uptime. If I want the Actually Bot to reply every time someone mentions it, I need it to be on a machine that is always on, and I was not going to leave my PC on for this (nor did I want it clobbering my GPU when I was in the middle of a game).

So my solution was this little guy:

https://www.peterkrupa.lol/wp-content/uploads/2024/05/lenovo.jpg… a Lenovo ThinkPad with a 3.3GHz quad-core i7 and 8gb of RAM. We got this refurbished machine when the pandemic was just getting going and it was my son’s constant companion for 18 months. It’s nice to be able to put it to work again. I put Ubuntu Linux on it and connected it to the home LAN.

I actually wasn’t even sure it would be able to run Llama3:8b. My workstation has an Nvidia GPU with 12gb of VRAM and it works fine for running modest LLMs locally, but this little laptop is older and not built for gaming and I wasn’t sure how it would handle such a heavy workload.

Fortunately, it worked with no problems. For running a chatbot, waiting 2 minutes for a reply is unacceptable, but for a bot that posts to social media, it’s well within range of what I was shooting for, and it didn’t seem to have any performance issues as far as the quality of the responses either.

The last thing I had to figure out was how to actually run everything from the Lenovo. I suppose I could have copied the Python files and tried to recreate the virtual environment locally, but I hate messing with virtual environments and dependencies, so I turned to the thing everyone says you should use in this situation: Docker.

This was actually great because I’d been wanting to learn how to use Docker for awhile but never had the need. I’d installed it earlier and used it to run the WebUI front end for Ollama, so I had a little bit of an idea how it worked, but the Actually Bot really made me get into its working parts.

So, I wrote a Docker file for my Python app, grabbed all the dependencies and plopped them into a requirements.txt file, and built the Docker image. Then I scr’d the image over to the Lenovo, spun up the container, and boom! The Actually Bot was running!

Well, OK, it wasn’t that simple. I basically had to learn all this stuff from scratch, including the console commands. And once I had the Docker container running, my app couldn’t connect to Ollama because it turns out, because Ollama is a server, I had to launch the container with a flag indicating that it shared the host’s network settings.

Then once I had the Actually Bot running, it kept crashing when people tagged it in a post that wasn’t a reply to another post. So, went back to the code, squashed bug, redeploy container, bug still there because I didn’t redeploy the container correctly. There was some rm, some prune, some struggling with the difference between “import” and “load” and eventually I got everything working.

Currently, the Actually Bot is sitting on two days of uninterrupted uptime with ~70 successful “Actually,” replies, and its little laptop home isn’t even on fire or anything!

Moving forward, I’m going to tweak a few things so I can get better logging and stats on what it’s actually doing so I don’t have to check its posting history on Mastodon. I just realized you can get all the output that a Python script running in a Docker container prints with the command docker logs [CONTAINER], so that’s cool.

The other thing I’d like to do is build more bots. I’m thinking about spinning up my own Mastodon instance on a cheap hosting space and loading it with all kinds of bots talking to each other. See what transpires. If Dead Internet Theory is real, we might as well have fun with it!

https://www.peterkrupa.lol/2024/05/01/actually-building-a-bot-is-fun/

#Docker #Llama3 #Ollama #Python

image/jpeg
reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ kellogh

davep, 11 days ago to python

Going on stream to tinker some more with an Ollama client I’m building for myself: https://www.youtube.com/watch?v=LzHUdfR4PRg

#Python #Terminal #Ollama #Textual

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ willmcgugan

joe, 14 days ago to ai

Yesterday, we looked at how to write a JavaScript app that uses Ollama. Recently, we started to look at Python on this site and I figured that we better follow it up with how to write a Python app that uses Ollama. Just like with JavaScript, Ollama offers a Python library, so we are going to be using that for our examples. Also just like we did with the JavaScript demo, I am going to be using the generate endpoint instead of the chat endpoint. That keeps things simpler but I am going to explore the chat endpoint also at some point.

Install the Ollama Library

The first step is to run pip3 install ollama from the terminal. First, you need to create a virtual environment to isolate your project’s libraries from the global Python libraries.

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-22-at-5.58.34%E2%80%AFPM.png?resize=1024%2C647&ssl=1

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-22-at-5.59.03%E2%80%AFPM.png?resize=1024%2C647&ssl=1

Basic CLI example

At this point, we can start writing code. When we used the web service earlier this week, we used the generate endpoint and provided model, prompt, and stream as parameters. We set the stream parameter to false so that it would return a single response object instead of a stream of objects. When using the python library, the stream parameter isn’t necessary because it returns a single response object by default. We still provide it with a model and a prompt, though.

If you run it from the terminal, the response will look familiar.

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-22-at-8.05.20%E2%80%AFPM.png?resize=1024%2C647&ssl=1

If you replace print(output) with print(output['response']), you can more clearly see the important bits.

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-22-at-8.09.04%E2%80%AFPM.png?resize=1024%2C647&ssl=1

Basic Web Application Example

The output is very similar to the node-fetch example from earlier this week. Last week, when we looked at how to dockerize a node app, we output an array as an unordered list. Let’s see if we can replicate that result using the output from Ollama.

If you pip install flask to install flask, you can host a simple HTTP page at port 8080 and with the magic of json.loads() and a for loop, you can build your unordered list.

So, what does the output look like?

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-22-at-8.27.30%E2%80%AFPM.png?resize=1024%2C651&ssl=1

Every time you load the page, it makes a server-side API call to Ollama, gets a list of large cities in Wisconsin, and displays them on the website. The list is never the same (because of hallucinations) but that is another issue.

Have any questions, comments, etc? Please feel free to drop a comment, below.

https://jws.news/2024/how-to-write-a-python-app-that-uses-ollama/

#AI #Flask #LLM #Ollama #Python

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramikrispin, 15 days ago to llm

It was a pleasure to present this morning at the ODSC East about data automation with LMM.

Code examples and a tutorial are available on this repo: https://github.com/RamiKrispin/lang2sql
The slides are available on this repo: https://github.com/RamiKrispin/talks/tree/main/202404%20ODSC%20East%202024%20-%20%20Data%20Automation%20with%20LLM%20

Thanks to the conference organizers for the invite and the folks attending the session! 🙏

#llm #data #DataScience #python #ollama #openai

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramikrispin, 15 days ago to llm

In case you are wondering, the new Microsoft mini LLM - phi3, can handle code generation, in this case, SQL.

I compared the runtime (locally on CPU) with respect to codellama:7B using Ollama, and surprisingly the Phi3 runtime was significantly slower.

#llm #DataScience #python #phi #machinelearning #ollama

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ neuronenstern

joe, 15 days ago to ai

So far this week, we have looked at how to use Ollama from the CLI, how to use Ollama from the web service, and how to use Ollama from a phone or iPad. Today we are going to be using the Ollama JavaScript Library to write an application.

Install the Ollama Library

The first step is to run npm i ollama from the terminal.

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-21-at-8.30.04%E2%80%AFAM.png?resize=1024%2C728&ssl=1

That installs Ollama as a dependency in package.json.

Basic CLI example

At this point, we can start writing code. When we used the web service earlier this week, we used the generate endpoint and provided model, prompt, and stream as parameters. We set the stream parameter to false so that it would return a single response object instead of a stream of objects. When using the javascript library, the stream parameter isn’t necessary because it returns a single response object by default. We still provide it with a model and a prompt, though.

If you run it from the terminal, the response will look familiar.

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-21-at-9.19.38%E2%80%AFAM.png?resize=1024%2C728&ssl=1

Basic Web Application Example

The output is very similar to the node-fetch example from earlier this week. Last week, when we looked at how to dockerize a node app, we output an array as an unordered list. Let’s see if we can replicate that result using the output from Ollama.

If you npm install express to install express, you can host a simple HTTP page at port 8080 and with the magic of JSON.parse() and a for loop, you can build your unordered list.

So, what does the output look like?

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-21-at-8.03.57%E2%80%AFPM.png?resize=1024%2C796&ssl=1

Every time you load the page, it makes a server-side API call to Ollama, gets a list of large cities in Wisconsin, and displays them on the website. The list is never the same (because of hallucinations) but that is another issue.

Have any questions, comments, etc? Please feel free to drop a comment, below.

https://jws.news/2024/how-to-write-a-javascript-app-that-uses-ollama/

#AI #JavaScript #LLM #NodeJs #Ollama

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

davep, 15 days ago to python

TIL Midlothian is in a different timezone from Hampshire.

#Python #Programming #Ollama

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

joe, 16 days ago to ai

Earlier this year, I started looking at how to run a fully on-prem AI. In February, I bought a machine to run the inference engine on and set up Tailscale (which works similarly to Hamachi) to connect to it remotely. If you want to use it remotely, there are a lot of options for native clients.

MacOS

My favorite client for MacOS is MindMac. You can buy it for under $30, it works with multiple models, servers, and server types, and it is easy to use.

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-20-at-2.34.12%E2%80%AFPM.png?resize=1024%2C690&ssl=1

If you want to look further into it, you can check it out at mindmac.app.

Android

My favorite client for Android is Amallo. It is $23 and like MindMac, it works with multiple models, servers, and server types. My only complaint would be that uploading a base64-encoded image to the model doesn’t seem to work well.

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot_20240420-143906.png?resize=461%2C1024&ssl=1

If you want to look further into it, you can check it out at doppeltilde.com.

ipadOS

There is a version of Amallo for iPadOS but I have been liking Enchanted LLM more. If you like it, there is a version for macOS as well. It has the added benefit of being free.

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/IMG_0088.jpg?resize=672%2C1024&ssl=1

If you want to look further into it, you can check it out at the project’s GitHub page.

Have any questions, comments, etc? Please feel free to drop a comment, below.

https://jws.news/2024/how-i-use-ai/

#AI #Amallo #Enchanted #LLM #MindMac #Ollama

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

joe, 17 days ago (edited 17 days ago) to programming

Yesterday, we played with Llama 3 using the Ollama CLI client (or REPL). Today I figured that we would play with it using the Ollama API. The Ollama API is documented on their Github repo. Ollama has a client that runs when you run ollama run llama3 and a service that can be accessed from something like MindMac, Amallo, or Enchanted. The service is what starts when you run ollama serve.

In our first Llama 3 post, we asked the model for “a comma-delimited list of cities in Wisconsin with a population over 100,000 people”. Using Postman and the completion API endpoint, you can ask the same thing.

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-20-at-1.30.48%E2%80%AFPM.png?resize=1024%2C811&ssl=1

You will notice the stream parameter is set to false in the body. If the value is false, the response will be returned as a single response object, rather than a stream of objects. If you are using the API with a web application, you will want to ask the model for the answer as JSON and you will probably want to provide an example of how you want the answer formatted.

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-20-at-1.45.15%E2%80%AFPM.png?resize=1024%2C811&ssl=1

You can use Node and Node-fetch to do the same thing.

If you run it from the terminal, it will look like this:

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-20-at-2.01.19%E2%80%AFPM.png?resize=1024%2C932&ssl=1

Have any questions, comments, etc? Please feel free to drop a comment, below.

https://jws.news/2024/lets-play-more-with-llama-3/

#AI #Amallo #Enchanted #llama3 #LLM #MindMac #NodeJs #Ollama #Postman

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

joe, 18 days ago to ai

Last week, Meta announced Llama 3. Thanks to Ollama, you can run it pretty easily. There are 8b and 70b variants available. There are also pre-trained or instruction-tuned variants available. I am not seeing it on the Hugging Face Leader Board yet but the bit that I have played around with it has been promising.

Here are two basic test questions:

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-20-at-12.15.45%E2%80%AFPM.png?resize=989%2C1024&ssl=1

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-20-at-12.27.47%E2%80%AFPM.png?resize=989%2C1024&ssl=1

Have any questions, comments, etc? Please feel free to drop a comment, below.

https://jws.news/2024/lets-play-with-llama-3/

#AI #llama3 #LLM #Ollama

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramikrispin, 22 days ago to llm

Llama 3 is already available on Ollama 🚀👇🏼

https://ollama.com/library/llama3

#llm #llama3 #ollama #python #DataScience

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramikrispin, 23 days ago to datascience

New release to Ollama 🎉

A major release to Ollama - version 0.1.32 is out. The new version includes:
✅ Improvement of the GPU utilization and memory management to increase performance and reduce error rate
✅ Increase performance on Mac by scheduling large models between GPU and CPU
✅ Introduce native AI support in Supabase edge functions

More details on the release notes 👇🏼
https://github.com/ollama/ollama/releases

Image credit: release notes

#DataScience #MachineLearning #llm #ollama #llama #python

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ stevensanderson

ainmosni, 1 month ago to programming

So, my #Copilot trial just expired, and while it did cut down on some typing, it also made me feel like the quality of my code was lower, and of course it felt dirty to use it considering that it's a license whitewashing machine.

I don't think I will be paying for it, I don't think the results are worth it.

#Programming #AI #LLM #GoLang

reply

expand (8)

collapse (8)

report

activity

copy /kbin url

copy original url

open original url

Loading...

0xZogG, 1 month ago

@ainmosni: there is other solution where they have free tier - #codeium
In general as non frontend dev, I like how it suggests for html and for go even just minimal placeholders function fill-ins is nice.
But as per license and not knowing where my code is sent, I'm looking for selfhosted solution. Found few options with #ollama, but unfortunately my current 10 years old HW is not enough for that :P

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ironicbadger, 1 month ago to selfhosted

I hooked the Enchanted LLM app up to my #selfhosted #ollama instance tonight. Running on the epyc box with an nvidia a4000.

I can’t notice a difference in speed between this and the real chat-gpt tbh. And I own the whole chain locally. Man this is cool!!

I even shared the ollama api endpoint with a buddy over Tailscale and now they’re whipping the llamas 🦙 api as well. Super fun times.

https://apps.apple.com/us/app/enchanted-llm/id6474268307

reply

expand (8)

collapse (8)

report

activity

copy /kbin url

copy original url

open original url

Loading...

DavidMarzalC, 1 month ago to Symfony Spanish

Encaramos el final de marzo con otro nuevo episodio de "Accesibilidad con Tecnologías libres", #ATL para los amigues.

En https://accesibilidadtl.gitlab.io/05 tenéis las notas del programa con los temas tratados y los enlaces.

Si seguís a este alias @6706483 , tendréis automáticamente las publicaciones del podcast dentro del Fediverso

Y por si aún no lo tenéis el feed RSS para vuestras aplicaciones es:
https://accesibilidadtl.gitlab.io/feed

En este episodio 05 participan entre otras personas:

@raivenra
-@jonathanchacon

@DavidMarzalC

Esperamos que os resulte interesante.

#Accesibilidad #LLaVA #oLLama #Joomla #PrestaShop #RealidadVirtual #XFCE #VotoElectronico #AccesibilidadTL

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ blaise

kohelet, 1 month ago to llm

So,
I downloaded Ollama,
installed a local LLM,
installed the Continue VSCode extension, configured it with my local LLM.

Now I just need something to do with it!
Like, any project at all.
huh.

#ollama #llm #continue

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...