fosai - kbin.social

This magazine is from a federated server and may be incomplete. Browse more on the original instance.

The Mamba/RWKV fuzz (lemmy.world)

Unveiling Ragna: An Open Source RAG-based AI Orchestration Framework Designed to Scale From Research to Production | Quansight Consulting (quansight.com)

Today, we are announcing the release of Ragna, a new open source project from Quansight designed to allow organizations to explore the power of Retrieval-Augmented Generation (RAG) based AI tools. Ragna provides an intuitive API for quick experimentation and built-in tools for creating production-ready applications allowing you...

Running Llama 2 70B GGML Instruct V2 Q4_1 with GPU offline on a Laptop using Oobabooga (lemmy.world)

Good news, I got it running with GPU after reloading CUDA libraries. The bad news is that the GPU layers are HEAVY (for lack of a better term). I could only run with 16 layers enabled on the GPU. If I add any more the model crashes as soon as I enter any prompt, even with empty context history. If you look at the numbers...

Running Llama 2 70B GGML Instruct V2 Q4_1 CPU only offline on a Laptop using Oobabooga (lemmy.world)

Follow the red marks in the main picture. HTOP is not showing the distrobox container’s memory usage in the bar graph, but it does display the correct memory usage in the processes window. The top left terminal shows the tty that launched Oobabooga. The bottom terminal is just a bash function I wrote to quickly check GPU...

Top Models from Open LLM Leaderboard (lemmy.world)

I made this plot to visualize current results from the huggingface benchmark