#TrainingData - kbin.social

AnnemarieBridy, 1 month ago to random

I’d be curious to know what effect, if any, this change has on a relatively large LLM’s likelihood of outputting strings of text that are memorized from training data sources.

Meta multi-token prediction makes LLMs up to 3X faster | VentureBeat https://venturebeat.com/ai/metas-new-multi-token-prediction-makes-ai-models-up-to-3x-faster/

reply

expand (8)

collapse (8)

report

activity

copy /kbin url

copy original url

open original url

Loading...

paninid, 1 month ago

@kellogh @AnnemarieBridy

For enterprise applications, the most valuable #trainingdata is behind corporate firewalls, not out on the internet.

And if that’s the case, maybe the models don’t need to be large in the first place.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ErikJonker, 2 months ago to ai

How easy would it be to use mastodon data for training AI ?
I would think collecting public posts from all instances is easy or are there some blocking measures to prevent collecting information. Personally i have no objection that public posts are used for training AI, i know however a lot of people won't like it probably.
#AI #Mastodon #trainingdata

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ErikJonker, 2 months ago to ai

Great article about the quest around data all the bigtech platforms need for their AI development. (giftlink)
https://www.nytimes.com/2024/04/06/technology/tech-giants-harvest-data-artificial-intelligence.html?unlocked_article_code=1.iU0._ftM.iQ4sm_o18rb3&smid=url-share
#AI #trainingdata #Data #NYT

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

kdkorte, 2 months ago to ai

Well, humanity hasn't yet created enough information for our AI overlords. Let's all pitch in and cancel our weekend to produce more data!
#AI #trainingdata
https://www.techtimes.com/articles/303216/20240403/ai-companies-running-out-internet-data-training-model.htm

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

ErikJonker, 2 months ago (edited 2 months ago) to ai Dutch

Just wow...amazing website/visualization about LAION-5B , a large dataset a lot of generative AI models are trained on.
https://knowingmachines.org/models-all-the-way
#AI #bigdata #LAION5B #trainingdata #CSAM

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ AAKL

m, 2 months ago to ai

Seize the memes of production! :ms_robot_headpats:

https://app.suno.ai/song/1bead4da-3c14-4082-9b5f-13b0a76af047/

"In a world of digital creation, I sing my song of light
But lurking in the shadows, a tale of endless night
Generative AIs, they steal from artists' hearts
Their creativity taken, ripped apart"

#AI #GenerativeAI #Music #Copyright #TrainingData

reply

expand (6)

collapse (6)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ janriemer

m, 2 months ago

:ms_sparkles: A billionaire's lament :ms_sparkles:

https://app.suno.ai/song/bd350912-07b9-44e2-9f21-6a637ce88f0e/

"Goodbye to the Dodo and the Black Rhino
Farewell, dear Thylacine and Pyrenean Ibex, oh
As Tesla and Apple ascend, our world declines
From my bunker in New Zealand, I sing these final lines"

#AI #GenerativeAI #Music #Copyright #TrainingData

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ janriemer

m, 2 months ago

:cyber_hacking: Rage Against the Machine Learning :cyber_hacking:

https://app.suno.ai/song/076d816c-dc70-4916-8358-5b2d00cd9bc1/

"They're the catgirls of the digital age
With geodesic domes, they're all the rage
Hacker boots and programming socks
Their Thinkpads loaded, locked and stocked"

#AI #GenerativeAI #Music #Copyright #TrainingData #ThinkPad #ProgrammingSocks #Catgirls #CCC

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

m, 2 months ago

:cursor_green: Leaping the Guard Rails

https://app.suno.ai/song/2ffa3423-2e8a-4a68-8fd3-584108193554/

"In a pixelated world, where bits collide
Hallucinations dance in 8-bit lullabies
AI models leaping, their guard rails untried
Spewing hate speech, casting shadows in the skies"

#AI #GenerativeAI #Music #Copyright #TrainingData #GuardRails #LLMs #Hallucinations #Chiptune #8bit

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

hosford42, 2 months ago to actuallyautistic

Requirements to put in a job description to discourage or filter out autistic people:

Comfortable with ambiguity

Strong people skills

Good culture fit

Multitasking

A fast-paced dynamic environment

Bachelor's degree or better

I see these things and think you don't want my >30 years of programming and machine learning experience, or my problem-solving skills and comprehensive knowledge that had people mistaking me for one of the team's PhDs, or my solutions that have proven patent-worthy. Your loss.

#ActuallyAutistic
@actuallyautistic
@neurodivergence

reply

expand (168)

collapse (168)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ dgoldsmith, akosma, datajake1999, chrisjrn +6 more

paninid, 2 months ago

@argv_minus_one @russellmcormond @Uair @hosford42 @actuallyautistic @neurodivergence

To eliminate fashy supremacist worldviews from “AI” MIGHT involve such deep curation of the #TrainingData set as to make the entire effort economically unviable.

SHOT: https://www.superversive.co/blog/crystallized-social-relations

CHASER: https://knowingmachines.org/models-all-the-way

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ErikJonker, 3 months ago (edited 3 months ago) to ai Dutch

Immense dataset of english books for training AI.
https://huggingface.co/datasets/storytracer/internet_archive_books_en
#AI #data #trainingdata

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

redcrew, 3 months ago to tumblr

"For now, there’s not much information about what any deal would entail, nor how much Automattic stands to gain from it."

https://www.theverge.com/2024/2/27/24084884/tumblr-midjourney-openai-training-data-deal-report

#Tumblr #Automattic #Midjourney #OpenAI #TrainingData

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ lewiscowles1986, BlackAzizAnansi

maxleibman, 3 months ago to OpenAI

I always thought "a penny for your thoughts" was devaluing, but that's a huge windfall compared to what OpenAI is willing to pay for your thoughts.

#OpenAI #TrainingData #LLMs #idioms

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ErikJonker, 3 months ago to ai Dutch

This is the big risk in the debate around copyright and AI trainingsdata.
"Requiring model-building organizations to purchase the rights to their training data would inevitably leave generative AI in the hands of a small number of unassailable monopolies"
https://www.oreilly.com/radar/the-openai-endgame/
#AI #openai #trainingdata #NYT #generativeAI

reply

expand (6)

collapse (6)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ AAKL

ErikJonker, 5 months ago to ai

The fight around IP/copyright with regard to trainingdata for AI could kill all competition for Google and Microsoft , they will probably be able to make some financial deals with publishers, also especially Google has an awful lot of data itself. For smaller players it will be even harder too compete or am i too pessimistic ?
#AI #GenerativeAI #copyright #trainingdata #IP #bigtech

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jimfl, 5 months ago to RSS

What if you wanted to subscribe to a couple of #RSS feeds but the only option was to subscribe to EVERY RSS feed?

That’s the principle behind tagging your mastodon bridge-bot-posted articles with #rss. Stop that.

The #RSS tag should be for posts about RSS and related topics.

reply

expand (6)

collapse (6)

report

activity

copy /kbin url

copy original url

open original url

Loading...

paninid, 5 months ago

@jimfl
I had the insight that the biases and quality of #trainingdata made #DataGovernance critical, but it’s really about the “crystallization of social relations”

https://www.superversive.co/blog/crystallized-social-relations

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ErikJonker, 5 months ago to ai

I am wondering whether there isn't an awful scenario where bigtech platforms agree on licensing fees for training their AI models on copyrighted high quality data and that only makes it harder for smaller companies/organisations to train their models ? 🤔
#ai #bigtech #data #generativeAI #copyright #trainingdata

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

itnewsbot, 6 months ago to Jailbreak

Hackaday Links: December 10, 2023 - In this week’s episode of “Stupid Chatbot Tricks,” it turns out that jailbreaking ... - https://hackaday.com/2023/12/10/hackaday-links-december-10-2023/ #hackadaycolumns #hackadaylinks #trainingdata #osiris-rex #jailbreak #parachute #chatgpt #slider #hubble #fpga #gyro #pyro #hst #llm #vlf

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

paninid, 7 months ago to Economics

People read short sentences.

Shorter the better.

https://www.superversive.co/blog/attention-economics

#attention #economics #ReadingCommunity #writers #advertising #marketing #branding #platforms #incentives #TrainingData #LinkedIn #ecosystems

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

itnewsbot, 9 months ago to ArtificialIntelligence

RoboAgent Gets Its MT-ACT Together - Researchers at Carnegie Mellon University have shared a pre-print paper on general... - https://hackaday.com/2023/08/20/roboagent-gets-its-mt-act-together/ #carnegiemellonuniversity #artificialintelligence #machinelearning #researchproject #computervision #generalization #kitchenhacks #trainingdata #robotshacks #trajectory #research #robotarm #science #kitchen #robot #news #meta

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

itnewsbot, 9 months ago to machinelearning

Hackaday Links: August 13, 2023 - Remember that time when the entire physics community dropped what it was doing to ... - https://hackaday.com/2023/08/13/hackaday-links-august-13-2023/ #roomtemperaturesuperconductor #researchplatform #hackadaycolumns #machinelearning #superconductor #termsofservice #hackadaylinks #semiconductor #interconnect #trainingdata #skepticism #spacejunk #pimoroni #scripps #slider #debunk #heist #lk-99 #orbit #theft #eula #flip #zoom #leo #meo #tos #ai

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

paninid, 10 months ago to machinelearning

Suppose you have a dim view of the 11th through 19th Amendments as unethical, wicked, corruptions of virtuous government.

Do you believe it’s ethical to include training data with arguments that support those aspects of America legal precedent, so when users try to learn, they are nudged to continue supporting governance principles which under-pin the Republic?

🤔🤨👀

Asking for a friend.

#AIgovernance #datagovernance #trainingdata #machinelearning #ethics

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

pluralistic, 10 months ago to random

Here's the #DictatorsDilemma: they want to block their country's frustrated elites from mobilizing against them, so they censor public communications; but they also want to know what their people truly believe, so they can head off simmering resentments before they boil over into regime-toppling revolutions.

--

If you'd like an essay-formatted version of this to read or share, here's a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:

https://pluralistic.net/2023/07/26/dictators-dilemma/#garbage-in-garbage-out-garbage-back-in

1/

reply

expand (38)

collapse (38)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Crell, ixi, erlend, kushal +2 more

pluralistic, 10 months ago

But other political scientists sharply disagreed. Last year, @henryfarrell, #JeremyWallace and #AbrahamNewman published a thoroughgoing rebuttal to Harari in #ForeignAffairs:

https://www.foreignaffairs.com/world/spirals-delusion-artificial-intelligence-decision-making

They argued that - like everyone who gets excited about AI, only to have their hopes dashed - dictators seeking to use AI to understand the public mood would run into serious #TrainingData bias problems.

4/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

LadyDragonfly, 1 year ago to random

At what point while dreaming up utopic futures where robots perform all the menial hard labor for no money leaving humanity to pursue meaningful lives of leisure writing music and making art did my parents generation fuck up and instead create the opposite

reply

expand (27)

collapse (27)

report

activity

copy /kbin url

copy original url

open original url

Loading...

HistoPol, 1 year ago

@gimulnautti @nycCatHerder @LadyDragonfly

...let the company go under through liability/damages lawsuits.
Actually, that is maybe the single biggest threat to these business models, though I'm not a lawyer.
If you read my post yesterday that it just takes 100 data sets to #poison training data and that therefore it is next to impossible to "secure the #TrainingData, we do not need to discuss that an #LLM which is now learning on infinite...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

coachtony, 1 year ago to random

It’s emerging public knowledge that #AICompanies are going to have to pay for #TrainingData. I’m assuming that this will happen.

Given that. Should #Medium participate on behalf of our Authors and how should we pass that money on to authors? The per article price is not going to be very much money, say $0.10. But we could put the money into the author payment pool and pay out by Quality/Popularity.

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ tchambers