Google touting that its latest #AI models and services can be grounded through its search results isn't the boast it thinks it is, especially considering the quality of its results lately. Has anybody considered the feedback loop of AI results being ranked hire and then being used to ground Gemini Pro?
I don’t want an internet where 90% of traffic and electricity is wasted to make generative „AI“ and their investors happy while their energy hunger destroys our planet. I want an internet that shares knowledge for free for everyone, so we can build a better world.
I am not saying that generative AI in general is wrong. Quite the opposite. Just like Machine Learning, Large Language Models can be a net positive. When they are focused and domain specific. But the #GIGO (Garbage In, Garbage Out) approach by the big players is not helping at all.
For the 2nd or 3rd time this week I've seen someone comment on a new data centre build with a stat about how 80% of data is never accessed. Then they talk about the energy and cooling used in modern DCs.
The reality is that data storage is actually incredibly efficient, and uses fuck all power. A hard disk is less than 10w and stores multiple users data.
Storing data, our photos, our memories, our history. Is not the problem.
One wonders how effective translations are when done by #LLMs since the corpus of material used to train languages is this crap. Do we have a #GIGO
problem?
Research Suggests A Large Proportion Of Web Material In Languages Other Than English Is Machine Translations Of Poor Quality Texts.
Working on a bit of sqlite for a thing and it’s kind of shocking how much of the search engine results for technical/dev info on SQLite are just blatantly incorrect.
Some of it’s LLM—same content repeated with light paraphrasing over a dozen different sites—but some of it’s just medium or dev.to influencers just repeating vague hearsay
(Which then gets into the LLM training set as accurate, I guess.)
@Frieke72@gerrymcgovern This is probably old-fashioned Bayesian inference based targeting, which has its own, different but just as bad, problems. Doctorow wrote on this, and the use by the police of similar software. This is how a corpus based on cops who 'randomly' select Black people for stop and search will produce a racist algorithm, using pure math 😬
And most (neural network style) "#AI" or #LLM systems cannot even tell you WHY they produced the result they give. It's all in the training data. Huge "garbage in, garbage out" risks/biases!
#GrokAI, #Elmo's add-on for X' gullible paid subscribers..."unsurprisingly, the chatbot is just as reliable at giving accurate information as the once-cherished platform formerly known as Twitter and its right-wing billionaire owner—which is to say, not at all. The chatbot produced fake timelines for news events and misinformation when tested by Motherboard, and lent credence to conspiracy theories such as Pizzagate."
One of the ways in which the web is like an ecosystem is that a synthetic text spill in one part of it can leak into others. Here, someone has posted ChatGPT output (unclear to what end) and the Google indexed it.
I learned of this particular #GoogleFail via BlueSky user aviendha69's post:
Diese KI-Rezeptgenerator macht nicht gerade Appetit! 🤢
🤖🍴 Ein Klick auf den 'Savey Meal-Bot' enthüllt wilde Mixturen, die von der KI stammen. Ein gewagter kulinarischer Ausflug mit Risiken und Nebenwirkungen:
Habsburg AI: #MachineLearning models fed on input that has been produced by other ML models...
A beautiful yermt I learned from @pluralistic showing why #GIGO (garbage in, garbage out) also explains why dictators can't rely on #AI to detect stirrings of revolt.
Here's the #DictatorsDilemma: they want to block their country's frustrated elites from mobilizing against them, so they censor public communications; but they also want to know what their people truly believe, so they can head off simmering resentments before they boil over into regime-toppling revolutions.
--
If you'd like an essay-formatted version of this to read or share, here's a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:
But adding more unreliable data to an unreliable dataset doesn't improve its reliability. #GIGO is the iron law of computing, and you can't repeal it by shoveling more garbage into the top of the training funnel:
When it comes to "AI" that's used for decision support - that is, when an algorithm tells humans what to do and they do it - then you get something worse than Garbage In, Garbage Out - you get Garbage In, Garbage Out, Garbage Back In Again.
@SonOfSunTzu One thing that quite surprised me is all the erstwhile open source advocates complaining about it being trained on public data. Like wtf? How do you reconcile those positions.
"Open source but just for me"?
(My main complaint with using that as training data is that most public code is hot garbage).
That story about AI hiring a human to solve a CAPTCHA for it? 100% #bullshit#AIHype fearmongering.
Also the outlook for actual #AISafety might be worse than we feared because it's not clear the people doing #AI know how to use the specification tools that have been developed for the task.
As my CS 101 prof* put it (paraphrased from memory), "if you don't know the input is garbage, you won't know the output is, either." #GIGO#AI#AIHype
Edit:
_
*Ed D. Reilly, Jr, co-author of Weighting for Baudot & editor of the 1st ed of the Concise Encyclopedia of Computer Science. Yes, this was bugging me so I had to look it up.
I love when people model good behavior instead of complaining about bad behavior.
An example of good de-anthropomorphized output, talking in the third person. “This model does not” and “the data used to develop this model suggests”.
Garbage in, garbage out is applicable to humans, too.
Naomi Klein wrote in The Guardian, "Why not algorithmic junk? Or glitches?" instead of “hallucinate”. Those are pretty descriptive and not anthropomorphic.