AnnemarieBridy, to random
@AnnemarieBridy@mastodon.social avatar

I’d be curious to know what effect, if any, this change has on a relatively large LLM’s likelihood of outputting strings of text that are memorized from training data sources.

Meta multi-token prediction makes LLMs up to 3X faster | VentureBeat https://venturebeat.com/ai/metas-new-multi-token-prediction-makes-ai-models-up-to-3x-faster/

paninid,
@paninid@mastodon.world avatar

@kellogh @AnnemarieBridy

For enterprise applications, the most valuable #trainingdata is behind corporate firewalls, not out on the internet.

And if that’s the case, maybe the models don’t need to be large in the first place.

ErikJonker, to ai
@ErikJonker@mastodon.social avatar

How easy would it be to use mastodon data for training AI ?
I would think collecting public posts from all instances is easy or are there some blocking measures to prevent collecting information. Personally i have no objection that public posts are used for training AI, i know however a lot of people won't like it probably.
#AI #Mastodon #trainingdata

ErikJonker, to ai
@ErikJonker@mastodon.social avatar
kdkorte, to ai
@kdkorte@fosstodon.org avatar

Well, humanity hasn't yet created enough information for our AI overlords. Let's all pitch in and cancel our weekend to produce more data!

https://www.techtimes.com/articles/303216/20240403/ai-companies-running-out-internet-data-training-model.htm

ErikJonker, (edited ) to ai Dutch
@ErikJonker@mastodon.social avatar

Just wow...amazing website/visualization about LAION-5B , a large dataset a lot of generative AI models are trained on.
https://knowingmachines.org/models-all-the-way
#AI #bigdata #LAION5B #trainingdata #CSAM

m, to ai
@m@martinh.net avatar

Seize the memes of production! :ms_robot_headpats:

https://app.suno.ai/song/1bead4da-3c14-4082-9b5f-13b0a76af047/

"In a world of digital creation, I sing my song of light
But lurking in the shadows, a tale of endless night
Generative AIs, they steal from artists' hearts
Their creativity taken, ripped apart"

#AI #GenerativeAI #Music #Copyright #TrainingData

m,
@m@martinh.net avatar

:ms_sparkles: A billionaire's lament :ms_sparkles:

https://app.suno.ai/song/bd350912-07b9-44e2-9f21-6a637ce88f0e/

"Goodbye to the Dodo and the Black Rhino
Farewell, dear Thylacine and Pyrenean Ibex, oh
As Tesla and Apple ascend, our world declines
From my bunker in New Zealand, I sing these final lines"

#AI #GenerativeAI #Music #Copyright #TrainingData

m,
@m@martinh.net avatar

:cyber_hacking: Rage Against the Machine Learning :cyber_hacking:

https://app.suno.ai/song/076d816c-dc70-4916-8358-5b2d00cd9bc1/

"They're the catgirls of the digital age
With geodesic domes, they're all the rage
Hacker boots and programming socks
Their Thinkpads loaded, locked and stocked"

#AI #GenerativeAI #Music #Copyright #TrainingData #ThinkPad #ProgrammingSocks #Catgirls #CCC

m,
@m@martinh.net avatar

:cursor_green: Leaping the Guard Rails

https://app.suno.ai/song/2ffa3423-2e8a-4a68-8fd3-584108193554/

"In a pixelated world, where bits collide
Hallucinations dance in 8-bit lullabies
AI models leaping, their guard rails untried
Spewing hate speech, casting shadows in the skies"

#AI #GenerativeAI #Music #Copyright #TrainingData #GuardRails #LLMs #Hallucinations #Chiptune #8bit

hosford42, to actuallyautistic
@hosford42@techhub.social avatar

Requirements to put in a job description to discourage or filter out autistic people:

  • Comfortable with ambiguity
  • Strong people skills
  • Good culture fit
  • Multitasking
  • A fast-paced dynamic environment
  • Bachelor's degree or better

I see these things and think you don't want my >30 years of programming and machine learning experience, or my problem-solving skills and comprehensive knowledge that had people mistaking me for one of the team's PhDs, or my solutions that have proven patent-worthy. Your loss.


@actuallyautistic
@neurodivergence

paninid,
@paninid@mastodon.world avatar

@argv_minus_one @russellmcormond @Uair @hosford42 @actuallyautistic @neurodivergence

To eliminate fashy supremacist worldviews from “AI” MIGHT involve such deep curation of the #TrainingData set as to make the entire effort economically unviable.

SHOT: https://www.superversive.co/blog/crystallized-social-relations

CHASER: https://knowingmachines.org/models-all-the-way

ErikJonker, (edited ) to ai Dutch
@ErikJonker@mastodon.social avatar
redcrew, to tumblr
@redcrew@mstdn.social avatar

"For now, there’s not much information about what any deal would entail, nor how much Automattic stands to gain from it."

https://www.theverge.com/2024/2/27/24084884/tumblr-midjourney-openai-training-data-deal-report

#Tumblr #Automattic #Midjourney #OpenAI #TrainingData

maxleibman, to OpenAI
@maxleibman@mastodon.social avatar

I always thought "a penny for your thoughts" was devaluing, but that's a huge windfall compared to what OpenAI is willing to pay for your thoughts.

#OpenAI #TrainingData #LLMs #idioms

ErikJonker, to ai Dutch
@ErikJonker@mastodon.social avatar

This is the big risk in the debate around copyright and AI trainingsdata.
"Requiring model-building organizations to purchase the rights to their training data would inevitably leave generative AI in the hands of a small number of unassailable monopolies"
https://www.oreilly.com/radar/the-openai-endgame/
#AI #openai #trainingdata #NYT #generativeAI

ErikJonker, to ai
@ErikJonker@mastodon.social avatar

The fight around IP/copyright with regard to trainingdata for AI could kill all competition for Google and Microsoft , they will probably be able to make some financial deals with publishers, also especially Google has an awful lot of data itself. For smaller players it will be even harder too compete or am i too pessimistic ?
#AI #GenerativeAI #copyright #trainingdata #IP #bigtech

jimfl, to RSS
@jimfl@hachyderm.io avatar

What if you wanted to subscribe to a couple of #RSS feeds but the only option was to subscribe to EVERY RSS feed?

That’s the principle behind tagging your mastodon bridge-bot-posted articles with #rss. Stop that.

The #RSS tag should be for posts about RSS and related topics.

paninid,
@paninid@mastodon.world avatar

@jimfl
I had the insight that the biases and quality of #trainingdata made #DataGovernance critical, but it’s really about the “crystallization of social relations”

https://www.superversive.co/blog/crystallized-social-relations

ErikJonker, to ai
@ErikJonker@mastodon.social avatar

I am wondering whether there isn't an awful scenario where bigtech platforms agree on licensing fees for training their AI models on copyrighted high quality data and that only makes it harder for smaller companies/organisations to train their models ? 🤔
#ai #bigtech #data #generativeAI #copyright #trainingdata

itnewsbot, to Jailbreak

Hackaday Links: December 10, 2023 - In this week’s episode of “Stupid Chatbot Tricks,” it turns out that jailbreaking ... - https://hackaday.com/2023/12/10/hackaday-links-december-10-2023/ -rex

paninid, to Economics
@paninid@mastodon.world avatar
itnewsbot, to ArtificialIntelligence
itnewsbot, to machinelearning
paninid, to machinelearning
@paninid@mastodon.world avatar

Suppose you have a dim view of the 11th through 19th Amendments as unethical, wicked, corruptions of virtuous government.

Do you believe it’s ethical to include training data with arguments that support those aspects of America legal precedent, so when users try to learn, they are nudged to continue supporting governance principles which under-pin the Republic?

🤔🤨👀

Asking for a friend.

#AIgovernance #datagovernance #trainingdata #machinelearning #ethics

pluralistic, to random
@pluralistic@mamot.fr avatar

Here's the #DictatorsDilemma: they want to block their country's frustrated elites from mobilizing against them, so they censor public communications; but they also want to know what their people truly believe, so they can head off simmering resentments before they boil over into regime-toppling revolutions.

--

If you'd like an essay-formatted version of this to read or share, here's a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:

https://pluralistic.net/2023/07/26/dictators-dilemma/#garbage-in-garbage-out-garbage-back-in

1/

pluralistic,
@pluralistic@mamot.fr avatar

But other political scientists sharply disagreed. Last year, @henryfarrell, #JeremyWallace and #AbrahamNewman published a thoroughgoing rebuttal to Harari in #ForeignAffairs:

https://www.foreignaffairs.com/world/spirals-delusion-artificial-intelligence-decision-making

They argued that - like everyone who gets excited about AI, only to have their hopes dashed - dictators seeking to use AI to understand the public mood would run into serious #TrainingData bias problems.

4/

LadyDragonfly, to random

At what point while dreaming up utopic futures where robots perform all the menial hard labor for no money leaving humanity to pursue meaningful lives of leisure writing music and making art did my parents generation fuck up and instead create the opposite

HistoPol,
@HistoPol@mastodon.social avatar

@gimulnautti @nycCatHerder @LadyDragonfly

...let the company go under through liability/damages lawsuits.
Actually, that is maybe the single biggest threat to these business models, though I'm not a lawyer.
If you read my post yesterday that it just takes 100 data sets to #poison training data and that therefore it is next to impossible to "secure the #TrainingData, we do not need to discuss that an #LLM which is now learning on infinite...

coachtony, to random
@coachtony@me.dm avatar

It’s emerging public knowledge that #AICompanies are going to have to pay for #TrainingData. I’m assuming that this will happen.

Given that. Should #Medium participate on behalf of our Authors and how should we pass that money on to authors? The per article price is not going to be very much money, say $0.10. But we could put the money into the author payment pool and pay out by Quality/Popularity.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • everett
  • magazineikmin
  • mdbf
  • thenastyranch
  • khanakhh
  • rosin
  • Youngstown
  • ethstaker
  • slotface
  • modclub
  • kavyap
  • DreamBathrooms
  • Durango
  • provamag3
  • ngwrru68w68
  • InstantRegret
  • tacticalgear
  • GTA5RPClips
  • cubers
  • normalnudes
  • osvaldo12
  • tester
  • anitta
  • cisconetworking
  • megavids
  • Leos
  • lostlight
  • All magazines