@osma@sigmoid.social avatar

osma

@osma@sigmoid.social

Inf Sys Specialist at NatLibFi. Creator of #Annif automated subject indexing (text classification) tool built on AI and ML methods.
Opted in to tootfinder

This profile is from a federated server and may be incomplete. Browse more on the original instance.

qlp, to python
@qlp@linh.social avatar

This joke has probably been made a bunch of times, but...

Python 3.14, not to be confused with PyPI.

osma,
@osma@sigmoid.social avatar

@qlp
What about PyPy then?

osma, to random
@osma@sigmoid.social avatar

I just got baited into applying Annif on a new task...

Ari Hershowitz published a data set of US Congress bills and posted about it on LinkedIn, including some results on applying LLMs for classifying them.

https://huggingface.co/datasets/dreamproit/bill_labels_us
https://www.linkedin.com/posts/ari-hershowitz_dreamproitbilllabelsus-datasets-at-hugging-activity-7193325364230721536-fz61

I applied #Annif on the data set to predict policy areas with 90% accuracy and legislative subjects with a F1 score of nearly 74%. These are much better results using only cheap traditional ML approaches.

https://github.com/osma/annif-us-congress-bills

mcc, to random
@mcc@mastodon.social avatar

Hard to imagine a signal that a website is a rugpull more intense than banning users for trying to delete their own posts

https://www.tomshardware.com/tech-industry/artificial-intelligence/stack-overflow-bans-users-en-masse-for-rebelling-against-openai-partnership-users-banned-for-deleting-answers-to-prevent-them-being-used-to-train-chatgpt

Like just incredible "burning the future to power the present" energy here

osma,
@osma@sigmoid.social avatar

@mcc
@WomanCorn That's exactly what they've done. https://stackoverflow.com/help/gen-ai-policy

As noted above, all content published on SO is available under the CC BY-SA license, which is usually taken to mean that training LLMs is permitted. https://stackoverflow.com/help/licensing

osma, to ai
@osma@sigmoid.social avatar

AI Sauna is about to start!
I will give a talk about Civilized AI in around half an hour. You can follow the livestream.

In the evening you can meet me at the sauna. Tomorrow will be an AI hackathon.

https://meta.wikimedia.org/wiki/AI_Sauna

osma, to random
@osma@mas.to avatar

OTD in 1712, in Sweden (and Finland) it was not the first of March, but the 30th of February.

#impossibledates #otd
https://en.m.wikipedia.org/wiki/1712_in_Sweden#Events

osma,
@osma@sigmoid.social avatar

@SemAntiKast @osma
A few other instances of February 30th are listed here: https://en.m.wikipedia.org/wiki/List_of_non-standard_dates#February_30

...but none of the others are as "real" as the one in Sweden 1712.

SmudgeTheInsultCat, to random
@SmudgeTheInsultCat@mas.to avatar
osma,
@osma@sigmoid.social avatar

@SmudgeTheInsultCat The same happened with a lake in Northern Karelia, near the Finnish-Russian border. It was called simply "jaur" (lake) by locals speaking Sámi language. In Finnish maps it thus became "Jaurjärvi" (lake lake). Russian mapmakers called it "Jaurjärviozero" (you can guess...) and then during WW2, Germans based their maps on Russian maps and dutifully named it "Jaurjärviozerosee" (lake lake lake lake).

Recounted e.g. here (in Finnish): http://karirydman.blogspot.com/2010/12/jaurjarviozerosee.html

dgar, to random
@dgar@aus.social avatar

Finland has closed it borders.

Now nobody can cross the Finnish line.

osma,
@osma@sigmoid.social avatar

@tessarakt
Because we Finns like to punnish them.
@dgar

osma, to random Finnish
@osma@sigmoid.social avatar

Uudelleensyntynyt Väri-Signe-bottini @varisigne sai aika kivasti seuraajia, joten päätin perustaa sille kollegan: Väri-Timiri @varitimiri julkaisee vastaavaan tapaan Ivan Timiriasewin, Helsinkiin rakastuneen venäläissyntyisen upseerin, historiallisia valokuvia mm. ensimmäisen maailmansodan ja sisällissodan ajoilta. Tervetuloa seuraamaan sitäkin!

Molemmat kuvabotit julkaisevat nyt tekoälyväritetyn kuvan lisäksi myös alkuperäisen mustavalkoisen kuvan.

#tekoäly #valokuvat #Helsinki #BotArt

simon, to random
@simon@simonwillison.net avatar

Many options for running Mistral models in your terminal using LLM

I wrote about a whole bunch of different ways you can use my LLM tool to run prompts through Mistral 7B, Mixtral 8x7B and the new Mistral-medium from the terminal:

https://simonwillison.net/2023/Dec/18/mistral/

osma,
@osma@sigmoid.social avatar

@simon
Excellent as always! Thanks!

Minor nitpick: You say that Mistral Small beats GPT-3.5 on every metric. But in the table it has slightly lower scores for WinoGrande and MT Bench.

b0rk, (edited ) to random
@b0rk@jvns.ca avatar

if you're an infrequent command line user -- what text editor do you use if you need to occasionally edit a file on the command line (other than vim/emacs)?

curious about what people use to edit a git commit message etc

if you picked 'other', I'd love to hear what you do in the replies!

osma,
@osma@sigmoid.social avatar

@b0rk
joe!

simon, to random
@simon@simonwillison.net avatar

New LLM paper highlighting quite how weird and ridiculous these things are https://arxiv.org/abs/2307.11760

Adding "it's important to my career" can produce better results, across every model they tested!

osma,
@osma@sigmoid.social avatar
osma,
@osma@sigmoid.social avatar

@simon I propose that this kind of prompt engineering should be called "silly computing", or "sillyputing" for short. With nods to Silly Putty and of course Monty Python.

"I'm afraid your prompt isn't silly enough. Can you make it sillier?"

#sillyputing

b0rk, to random
@b0rk@jvns.ca avatar

today I'm thinking about the tradeoffs of using git rebase a bit. I think the goal of rebase is to have a nice linear commit history, which is something I like.

but what are the costs of using rebase? what problems has it caused for you in practice? I'm really only interested in specific bad experiences you've had here -- not opinions or general statements like “rewriting history is bad”

osma,
@osma@sigmoid.social avatar

@b0rk
It can be painful to review feature branches that are rebased during development (often for good reasons). My local branch gets out of sync with the remote, rebased branch so I can't just pull in the most recent work on top of what I had from before. I usually just delete the local branch and re-fetch and checkout it. I don't know if there's a more elegant way.

osma,
@osma@sigmoid.social avatar

@sarajw
@b0rk Seconded!

Also, if you had to fix any conflicts during earlier merges, when you do a rebase, you will often have to fix them again.

aarontay, to random

I understand whenever there's some new technology, librarians need to say things like rah rah things like "users will still need us Librarians to guide them to use those tools" to encourage ourselves but this means nothing if this is all words and no action. Think you are the best person to guide users for generative AI tools? Then really study them, as deeply as you can & not just wait for vendors to "educate" you & you endup a mouthpiece of a product you paid for.

osma,
@osma@sigmoid.social avatar

@brewsterkahle
@aarontay
Definitely interested, this is basically what I've been doing with my colleagues for the last few years. Especially #Annif but also other AI-related things and projects, including LLMs.

I think the chatbot made by the National Library of Luxembourg is an interesting example of AI helping library patrons: https://bnl.public.lu/en/a-la-une/actualites/communiques/2023/chatbot-eluxemburgensia.html

b0rk, to random
@b0rk@jvns.ca avatar
osma,
@osma@sigmoid.social avatar

@b0rk Thanks a lot for this, you did an amazing job here!

osma, to random Finnish
@osma@mas.to avatar

Väittävät, että 15-pikaratikan reitti Viikissä olisi jotenkin poikkeuksellisen vaikea ymmärtää. Enpä tiedä. Kiskojen reitillä on kiveä, liikenneympyrän ajoradalla asfaltointi. Kiskojen suuntaan on ihan selvä ajokiellon liikennemerkki.

Toisaalta kun Kalasatamassa asuu, niin päivittäin näkee kymmenien autoilijoiden täysin surutta ajavan vasten moottoriajoneuvolla ajokiellon merkkejä useammalla kujalla ja sillalla. Olisko kuitenkin kyse kuskien osaamisesta tai asenteesta?

osma,
@osma@sigmoid.social avatar

@osma Olen ajanut Viikin liikenneympyrästä aika monta kertaa viime vuosina ja aina se on hämmentävä, pimeällä suorastaan pelottava kokemus vilkkuvaloineen ja takavasemmalta tulevine ratikoineen. Jo ennen pikaratikkaa, kun Viikinmäen siltaa käyttivät 550-bussit. Olen nähnyt henkilöauton eksyvän sinne jo ennen raiteiden rakentamista. Jos minulta kysytään, niin ei kovin onnistunut risteys.

Tämä ei tietenkään millään tavoin poista kuljettajan vastuuta tarkkailla ympäristöään ja liikennemerkkejä.

simon, to random
@simon@simonwillison.net avatar

I'm on the latest episode of the Rooftop Ruby podcast with @collin and @joeldrapper talking about Large Language Models

It was a really excellent conversation - we covered a huge amount of ground

I'm trying something new: I put together my own transcript with Whisper, then cleaned that up and added inline links and section headings. Here's the result, complete with an embedded audio player that can jump to each different section: https://simonwillison.net/2023/Sep/29/llms-podcast/

osma,
@osma@sigmoid.social avatar

@simon
@collin @joeldrapper

Once again you've done an an amazing job, both in terms of content and form! I really liked the transcript but I would never had listened to such a long podcast. Thanks so much!

arstechnica, to random
@arstechnica@mastodon.social avatar

Can you melt eggs? Quora’s AI says “yes,” and Google is sharing the result

Incorrect AI-generated answers are forming a feedback loop of misinformation online.

https://arstechnica.com/information-technology/2023/09/can-you-melt-eggs-quoras-ai-says-yes-and-google-is-sharing-the-result/?utm_brand=arstechnica&utm_social-type=owned&utm_source=mastodon&utm_medium=social

osma,
@osma@sigmoid.social avatar

@arstechnica Apparently this kind of misinformation feedback loop happens with Google's own Bard conversations as well: https://mastodon.social/@nixCraft/111132139532868852

treyhunner, to python
@treyhunner@mastodon.social avatar

What's a #Python feature you wish existed but doesn't? 🤔

It can even be something others would think is absurd. 🛸

Dream big! 💭

osma,
@osma@sigmoid.social avatar

@treyhunner
Bing able to run computations over big data structures in parallel without absurd amounts of overhead. Something like:

results = set()
parallel for big_obj in big_obj_list:
results.add(analyze(big_obj))

Maybe the no-GIL work will enable this eventually.

jhilden, to random
@jhilden@vis.social avatar

The Holy See implies the existence of the Holy Hear, Holy Smell and Holy Feel

osma,
@osma@sigmoid.social avatar

@jhilden Not to mention the Holy Taste

osma,
@osma@sigmoid.social avatar

@jhilden
If we extend this to not just using our senses but also other everyday activities, then we have things like Holy Talk, Holy Walk, Holy Sleep etc. And, well, eventually, Holy Shit.

osma, to Finland
@osma@sigmoid.social avatar

Big demonstration against government racism and fascism in #Helsinki, #Finland.

#MeEmmeVaikene

simon, to random
@simon@simonwillison.net avatar

Here's the video, full set of slides and annotated transcript for the talk I gave at WordCamp US #WCUS on Friday: "Making Large Language Models work for you"
https://simonwillison.net/2023/Aug/27/wordcamp-llms/

osma,
@osma@sigmoid.social avatar

@simon
Thanks again for a fantastic talk and a superb transcript! You're doing an awesome job!

One question: you have a wide repertoire of skills and tricks for using pretrained LLMs, but you didn't mention fine-tuning, why? To me this is perhaps the single most exciting way of applying LLMs to solve practical problems - teaching them new skills just by example. And with techniques like PEFT and QLoRA it's very cheap and easy. Of course many things are possible without fine-tuning as well...

osma,
@osma@sigmoid.social avatar

@simon
Very good points!

I've tried fine-tuning with the GPT-3 API, and it was surprisingly easy, but a bit costly. Lately I've also fine-tuned Llama 2 locally, and it was a lot more difficult to put together all the pieces in the right way. Even ChatGPT is of limited help because the tools and libraries are so new. A good cookbook would certainly help! There are of course many examples in blog posts etc. but it's hard to adapt those to your own situation.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • anitta
  • InstantRegret
  • mdbf
  • ngwrru68w68
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • osvaldo12
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • DreamBathrooms
  • JUstTest
  • tacticalgear
  • ethstaker
  • provamag3
  • cisconetworking
  • tester
  • GTA5RPClips
  • cubers
  • everett
  • modclub
  • megavids
  • normalnudes
  • Leos
  • lostlight
  • All magazines