(1/3) Last Friday, I was planning to watch Masters of the Air ✈️, but my ADHD had different plans 🙃, and I ended up running a short POC and creating a tutorial for getting started with Ollama Python 🚀. The settings are available for both Docker 🐳 and locally.
TLDR: It is straightforward to run LLM models locally with the Ollama Python library. Models with up to ~7B parameters run smoothly with low compute resources.
(2/3) The tutorial focuses on the following topics:
✅ Setting up Ollama server 🦙
✅ Setting up Python environment 🐍
✅ Pulling and running LLM (examples of Mistral, Llama2, and Vicuna)
(3/3) The tutorial will get you to run Ollama inside a dockerized container. Yet, there are some missing pieces, such as mounting LLM models from the local environment to avoid downloading the models during the build time. I plan to explore this topic sometime in the coming weeks.
The Code Llama 34b model isn't half bad! Been toying around with it integrated into clion having it explain my own code to me and generate small functions and it's been so far around 90% successful, with most of the errors being minor, the bug detection does have a decent amount of false positives though. I also like that it's aware enough of api's to give doc links
Bonus points for it going off on a tangent once on why console applications are better than gui.
So #Steeve got a major upgrade recently. He moved from a #gptneo (2.4B) model to a #llama2 (7B) model. Trained on 300k messages from our private chat history, Steeve is way more capable of following the conversation now. He used to have some "favorite phrases" he would say a lot, and I'm seeing less of that. His vision and reading models also got upgraded, so he gets more detail about the links and memes we share. Long live Steeve! :steeve:
A major release to Ollama - version 0.1.32 is out. The new version includes:
✅ Improvement of the GPU utilization and memory management to increase performance and reduce error rate
✅ Increase performance on Mac by scheduling large models between GPU and CPU
✅ Introduce native AI support in Supabase edge functions
After months of work and $10 million, Databricks has unveiled DBRX - the world's most potent publicly available open-source large language model.
DBRX outperforms open models like Meta's Llama 2 across benchmarks, even nearing the abilities of OpenAI's closed GPT-4. Novel architectural tweaks like a "mixture of experts" boosted DBRX's training efficiency by 30-50%.
⚠️ @forrestbrazeal on the inside threat to OSS
🍴Vicki Boykis says Redis is forked
👻 @johnonolan says Ghost is federating
🦙 Meta Engineering announces Llama 3
❓ @eieio's questions to ask when you don't want to work
🎙 hosted by @jerod
Please, use #AI to generate tons of #content that you otherwise couldn't.
But for the love of all that is holy, pay attention to what you are putting out. Read the output. If it doesn't say exactly what you would say, edit it! Make changes. Regenerate. Go through the process of making it good.
I truly don't think people hate AI content. They hate lazy content.
Anyone out there dabbling with on-prem AI? All of the numbers that I’m seeing for RAM requirements on 7b, 13b, 70b models seem to be correct for a 1-user scenario but I’m curious what folks are seeing for 2, 10, or 50 users.