Dear #AI #Fediverse, there's been some buzz recently about #LanguageModels that... - Artificial Intelligence

rysiek, 11 months ago

Dear #AI #Fediverse, there's been some buzz recently about #LanguageModels that are not gigantic black boxes, and #MachineLearning in general, developed as #FLOSS.

There's this Google internal document, for example, that points out FLOSS community is close to eating Google's and OpenAI's cake:
ttps://www.semianalysis.com/p/google-we-have-no-moat-and-neither

So here is my question to you:

What are the best examples of useful, small, on-device models already out there?

:boost_requested:

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ acute_distress, GhostOnTheHalfShell, maegul, noodlejetski +3 more

Image

Image alternative text

rysiek, 11 months ago

My examples would be:

Mozilla's in-browser automatic translation:
https://www.mozilla.org/en-US/firefox/features/translate/

Apple's OCR in iOS is another decent example (not FLOSS, but that's okay for what I need it for)
https://support.apple.com/en-us/HT212630

Not exactly "on-device", but Mastodon's OCR available when attaching images:
https://github.com/mastodon/mastodon/issues/7419

Anything else?

Context: I am writing something about #AI and I want to show examples of useful small models in order to help non-techies understand why they are important.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

ljrk, 11 months ago

@rysiek Yess, this so much! There's actually use in AI (despite me hating almost any AI tool out there). There's a huge perspective around personal empowerment and independence. Incidentally, those tools are usually also more ethical, IMHO, when it comes to how they're built or used (self promo
https://ljrk.codeberg.page/ethical-ai.html)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ rysiek

harshad, 11 months ago

@rysiek I've been using Whisper to transcribe voice notes to my journal.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

rysiek, 11 months ago (edited 11 months ago)

@harshad am I missing something or is this neither small, nor FLOSS, nor self-hostable?

Edit: ah no, seems open source even though it's OpenAI
https://github.com/openai/whisper

Thanks!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

harshad, 11 months ago

@rysiek it is self-hosted, and MIT licensed: https://github.com/openai/whisper/blob/main/LICENSE

As for size, see https://github.com/ggerganov/whisper.cpp , 75MB for smallest model seems reasonable.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ rysiek

rysiek, 11 months ago

@harshad amazing, thank you!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

idlestate, 11 months ago

@rysiek
@harshad

Unless the training data are also free and the production models used to perform the work are substantially reproducible, then those models and the code to deploy them propagate power imbalances as fraught as proprietary object code: The model acts as a form of unauditable, AI-authored code.

Some folks in Debian did some work years ago, now, to try to parse out these issues, but scant notice has been afforded these challenges otherwise.

As a practical matter, the ability to bring enough computing to bear on running the training data through the training software offers a similar, but more quantitative rather than qualitative challenge to the abilities to make forks asserting typical software freedoms.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ rysiek

idlestate, 11 months ago

to be clear, by "scant notice" I mean in a FOSS-forward framing.

There is of course vigorous criticism of the misuses of AI. I wish not to contribute to how badly these have been disregarded more broadly, including but also beyond that frame.

@rysiek
@harshad

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

rysiek, 11 months ago

@idlestate @harshad yeah, and I wrote at length about it in Polish media:
https://oko.press/chatgpt-cala-prawda-o-wielkich-modelach-jezykowych

Now I am trying to find examples of smaller models that do as well or better than LLMs from Big Tech, to demonstrate the point that we might be able to do just fine without LLMs, regardless of what Big Tech is trying to convince us of.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

idlestate, 11 months ago

@rysiek

(incidentally, speaking of language models, free or not, DeepL did much better rendering your abstract into English, as accessed through the F-Droid-provided free-frontend-to-proprietary-backend than when I pasted it into LibreText's web form.)

@harshad

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

rysiek, 11 months ago

@idlestate @harshad nice!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

idlestate, 11 months ago

@rysiek

I am keenly interested to know of such models. So far as I can see, though, whisper isn't it, despite the freedom of the code to deploy the model.

Model size may impact ease of deployment, but otherwise it's not clear to me how it relates to the challenges on the build/training side?

Been a while since I've touched it, but Mozilla Common Voice is the only effort I've seen so far that touches on the freedom of the training side

https://commonvoice.mozilla.org/en

@harshad

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ rysiek

ryanfb, 11 months ago

@rysiek @harshad people have also been doing amazing things with improving Whisper performance, check out WhisperX https://github.com/m-bain/whisperX and whisper.cpp https://github.com/ggerganov/whisper.cpp

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

noodlejetski, 11 months ago

@rysiek man, the fancy ML stuff in the photos app is one of the few things I envy iOS users.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

OutOnTheMoors, 11 months ago

@rysiek The Mastodon OCR is very helpful for encouraging AltText

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

rysiek, 11 months ago

@OutOnTheMoors indeed!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ralismark, 11 months ago

@rysiek there's some machine learning models for removing noise from microphone input, at least on Linux:

https://github.com/Rikorose/DeepFilterNet

https://github.com/werman/noise-suppression-for-voice (also known as rnnoise)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ rysiek

XavCC, 11 months ago

@rysiek Voice to text, scribe
[french description,
https://scribe.cemea.org/pourquoi/https://gitlab.cemea.org/mallette/scribe
related: Vosk & Common Moz Voice
I'm sure it fits with you request

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ rysiek

yesitsanna, 11 months ago

@rysiek being able to search my iOS photo album for, say, "cats" is pretty cool.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

rysiek, 11 months ago

@yesitsanna that's using Visual Look Up, right?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

yesitsanna, 11 months ago

@rysiek um i press the search button and type in "cat" :D

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

rysiek, 11 months ago

@yesitsanna haha, fair enough! :D

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sv1, 11 months ago

@rysiek Face recognition in the Digikam photo-management suite is quite handy for private collections, based on OpenCV, no networking necessary.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ rysiek

sqrt2, 11 months ago

@rysiek Mastodon's OCR is on-device, actually. It downloads the weights into IndexedDB once, but from there it's all tesseract.js in your browser.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ rysiek

rysiek, 11 months ago

@sqrt2 amazing!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sqrt2, 11 months ago

@rysiek It's pretty neat how much can be done in WASM. Like, lichess.org embeds very strong in-browser chess position analysis with Stockfish.js, which can do both classical evaluation and evaluation with an NNUE.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

farshidhakimy, 11 months ago

@rysiek Samsung has OCR and "object cut-out" in their Gallery app too.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mathew, 11 months ago

@rysiek Apple's photo recognition too. It's not great, but it's steadily getting better — it can now do an OK job of recognizing birds and hedgehogs, for example.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

neofreko, 11 months ago

@rysiek I'm going to add https://mlc.ai/mlc-llm/. It's not yet on the practical level tho.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

blub, 11 months ago

@rysiek opendata.cam uses existing CV ML models on edge devices like nvidia jetson or coral tpu with SBC to count cars, people, ..

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

richard_merren, 11 months ago

@rysiek Are we only talking about the large language models as "AI" or are we following the trend of rebranding everything that was "ML" last year to be included in this year's new "AI" hype category?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

blub, 11 months ago

@rysiek Not NLP but voice recognition based on #Mozilla Voice dataset like #deepspeech

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

rysiek, 11 months ago

@blub thanks!

Are you aware of it being used anywhere "user-facing" yet?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

blub, 11 months ago

@rysiek personal voice assistants like #mycroft

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

rysiek, 11 months ago

@blub fantastic!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pre, 11 months ago

@rysiek As far as I can tell Facebook's "LLaMA" is the biggest open-sourced language model that people are running various variants of on their home GPUs.

They remain a black-box of course, nobody has any idea what's going on inside any of those multi-billion parameter confusions of virtual wires and matrices.

But the model is freely redistrible and you can read the value of every node at every microsecond if you want.

Just that nobody knows what any of them mean.

Sad that it's Facebook, but FB are pretty good at Open-Source software libraries.

https://ai.facebook.com/blog/large-language-model-llama-meta-ai/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

rysiek, 11 months ago

@pre yeah, that's true. I've seen a rust thing too:
https://github.com/rustformers/llm

But I am guessing that is just some kind of wrapper over Facebook's LLaMa.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pre, 11 months ago

@rysiek People are taking the Llama model and further training it to be specialized at whatever their own itch is.

You can feed it all of your own Fedi posts and have a virtual Rysiek I guess? 🤷

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

rysiek, 11 months ago

@pre finally, I can optimize my shitposting!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ErikJonker, 11 months ago

@rysiek GPT4ALL (on a laptop) ? Although you could argue about it's usefulness.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

rysiek, 11 months ago

@ErikJonker I'll have a look thanks!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment