Blaed

@Blaed@lemmy.world

This profile is from a federated server and may be incomplete. Browse more on the original instance.

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Hello everyone. Today I’d like to catch up on another paper, a popular one that has pushed a new fine-tuning trend called DPO (Direct Preference Optimization)....

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Hello everyone, I have another exciting Mamba paper to share. This being an MoE implementation of the state space model....

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Hello everyone, I have a very exciting paper to share with you today. This came out a little while ago, (like many other papers since my hiatus) so allow me to catch you up if you haven’t read it already....

Develop Alongside Local LLMs w/ Open Interpreter

I don’t think this has been shared here before. Figured now is as good time as ever....

What open-source LLMs are you using in 2024?

There has been an overwhelming amount of new models hitting HuggingFace. I wanted to kick off a thread and see what open-source LLM has been your new daily driver?...

Blaed, 4 months ago

I was pleasantly surprised by many models of the Deepseek family. Verbose, but in a good way? At least that was my experience. Love to see it mentioned here.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Blaed, 4 months ago

What sort of tokens per second are you seeing with your hardware? Mind sharing some notes on what you’re running there? Super curious!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Blaed, 7 months ago

I appreciate this comment more than you will know. Thanks for sharing your thoughts.

It’s been a challenge realizing this time capsule is more than that - but a grassroots community and open-source project bigger than me. Adjusting the content to reflect shared interests has been a concept I have grappled with these last few weeks - especially as we exit some of the exciting innovations we saw earlier this year.

I think the type of content series you mention is the next step here - that being practical and pragmatic insights that illustrate / enable new workflows and applications.

That being said, this type of content creation will likely take more time than the journalistic reporting I’ve been doing - but I think it’s absolutely worth the effort and the next logical evolution of whatever this forum becomes.

Thanks again for your kind words. I work 5/6 day weeks in my tech job on top of this, so burnout is a real thing. I think I’ll go for a hike this week and reevaluate how to best proliferate and spread FOSAI.

If you’re reading this now and have ideas of your own - I’m all ears.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Blaed, 8 months ago

I am actively exploring this question.

So far - it’s been the best performing 7B model I’ve been able to get my hands on. Anyone running consumer hardware could get a GGUF version running on almost any dedicated GPU/CPU combo.

I am a firm believer there is more performance and better quality of responses to be found in smaller parameter models. Not too mention interesting use cases you could apply fine-tuning an ensemble approach.

A lot of people sleep on 7B, but I think Mistral is a little different - there’s a lot of exploring to be had finding these use cases but I think they’re out there waiting to be discovered.

I’ll definitely report back on how the first attempt at fine-tuning this myself goes. Until then, I suppose it would be great for any roleplay or basic chat interaction. Given it’s low headroom - it’s much more lightweight to prototype with outside of the other families and model sizes.

If anyone else has a particular use case for 7B models - let us know here. Curious to know what others are doing with smaller params.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Blaed, 8 months ago

I respect your honesty.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Blaed, 8 months ago

What I find interesting is how useful these tools are (even with the imperfections that you mention). Imagine a world where this level of intelligence has a consistent low error rate.

Semantic computation and agentic function calling with this level of accuracy will revolutionize the world. It’s only a matter of time, adoption, and availability.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Blaed

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Develop Alongside Local LLMs w/ Open Interpreter

What open-source LLMs are you using in 2024?

FOSAI 2024

Blaed's Hiatus (Part I)

What kind of content do you want to see more of?

Llama 2 / WizardLM Megathread

Sharing brev.dev - A new platform for fine-tuning models on cloud GPUs

Create a Large Language Model from Scratch with Python – Tutorial (www.youtube.com)

HyperTech News Report #0003 - Expanding Horizons

Anyone else working with retrieval augmented generation? (RAG)

Mistral 7B Megathread

AutoGen - Enabling Next Generation LLM Applications

What do you think are some of the most interesting use cases for AGI?

Why do you like LLMs?