Hello everyone. Today I’d like to catch up on another paper, a popular one that has pushed a new fine-tuning trend called DPO (Direct Preference Optimization)....
Hello everyone, I have a very exciting paper to share with you today. This came out a little while ago, (like many other papers since my hiatus) so allow me to catch you up if you haven’t read it already....
There has been an overwhelming amount of new models hitting HuggingFace. I wanted to kick off a thread and see what open-source LLM has been your new daily driver?...
I was pleasantly surprised by many models of the Deepseek family. Verbose, but in a good way? At least that was my experience. Love to see it mentioned here.
I have temporarily paused my weekly news reports to pause for a moment and take stock to better gauge the content you all care about (and want to see more of in this community)....
I appreciate this comment more than you will know. Thanks for sharing your thoughts.
It’s been a challenge realizing this time capsule is more than that - but a grassroots community and open-source project bigger than me. Adjusting the content to reflect shared interests has been a concept I have grappled with these last few weeks - especially as we exit some of the exciting innovations we saw earlier this year.
I think the type of content series you mention is the next step here - that being practical and pragmatic insights that illustrate / enable new workflows and applications.
That being said, this type of content creation will likely take more time than the journalistic reporting I’ve been doing - but I think it’s absolutely worth the effort and the next logical evolution of whatever this forum becomes.
Thanks again for your kind words. I work 5/6 day weeks in my tech job on top of this, so burnout is a real thing. I think I’ll go for a hike this week and reevaluate how to best proliferate and spread FOSAI.
If you’re reading this now and have ideas of your own - I’m all ears.
So far - it’s been the best performing 7B model I’ve been able to get my hands on. Anyone running consumer hardware could get a GGUF version running on almost any dedicated GPU/CPU combo.
I am a firm believer there is more performance and better quality of responses to be found in smaller parameter models. Not too mention interesting use cases you could apply fine-tuning an ensemble approach.
A lot of people sleep on 7B, but I think Mistral is a little different - there’s a lot of exploring to be had finding these use cases but I think they’re out there waiting to be discovered.
I’ll definitely report back on how the first attempt at fine-tuning this myself goes. Until then, I suppose it would be great for any roleplay or basic chat interaction. Given it’s low headroom - it’s much more lightweight to prototype with outside of the other families and model sizes.
If anyone else has a particular use case for 7B models - let us know here. Curious to know what others are doing with smaller params.
What I find interesting is how useful these tools are (even with the imperfections that you mention). Imagine a world where this level of intelligence has a consistent low error rate.
Semantic computation and agentic function calling with this level of accuracy will revolutionize the world. It’s only a matter of time, adoption, and availability.
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Hello everyone. Today I’d like to catch up on another paper, a popular one that has pushed a new fine-tuning trend called DPO (Direct Preference Optimization)....
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Hello everyone, I have another exciting Mamba paper to share. This being an MoE implementation of the state space model....
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Hello everyone, I have a very exciting paper to share with you today. This came out a little while ago, (like many other papers since my hiatus) so allow me to catch you up if you haven’t read it already....
Develop Alongside Local LLMs w/ Open Interpreter
I don’t think this has been shared here before. Figured now is as good time as ever....
What open-source LLMs are you using in 2024?
There has been an overwhelming amount of new models hitting HuggingFace. I wanted to kick off a thread and see what open-source LLM has been your new daily driver?...
FOSAI 2024
Hello everyone....
Blaed's Hiatus (Part I)
Hello everyone,...
What kind of content do you want to see more of?
I have temporarily paused my weekly news reports to pause for a moment and take stock to better gauge the content you all care about (and want to see more of in this community)....
Llama 2 / WizardLM Megathread
Llama 2 & WizardLM Megathread...
Sharing brev.dev - A new platform for fine-tuning models on cloud GPUs
On my journey working on fine-tuning a model for !fosai I stumbled across brev.dev....
Create a Large Language Model from Scratch with Python – Tutorial (www.youtube.com)
HyperTech News Report #0003 - Expanding Horizons
cross-posted from: lemmy.world/post/6399678...
Anyone else working with retrieval augmented generation? (RAG)
What have been your experiences with it? What does your workflow look like?...
Mistral 7B Megathread
Starting a Mistral Megathread to aggregate resources....
AutoGen - Enabling Next Generation LLM Applications
Today I am very excited to share with you AutoGen - a new framework for enabling next generation LLM applications....
What do you think are some of the most interesting use cases for AGI?
To me, it’s pretty obvious how AGI can change the world....
Why do you like LLMs?
Genuinely curious....