matthewmaybe

@matthewmaybe@sigmoid.social

Developer of the first "AI art" detection model.

I'm interested in content moderation tooling, data dignity, and fair, fully disclosed, non-commercial uses of generative AI.

My profile pic was generated using Mitsua Diffusion, a unique text-to-image model that was trained from scratch on public domain and Creative Commons CC0-licensed data only.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

NanoRaptor, 2 months ago to random

Reinventing the potato.

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

matthewmaybe, 2 months ago

@NanoRaptor you laugh, but my grandfather built a career on exactly this

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

matthewmaybe, 4 months ago to random

Huge Proportion of Internet Is AI-Generated Slime, Researchers Find

https://futurism.com/the-byte/internet-ai-generated-slime

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

matthewmaybe, 4 months ago

From the article: "...the prevalence of AI-spun gibberish might make effectively training AI models in lower-resource languages nearly impossible in the long run. To train an advanced LLM, AI scientists need large amounts of high-quality data, which they generally get by scraping the web. If a given area of the internet is already overrun by nonsensical AI translations, the possibility of training advanced models in rarer languages could be stunted before it even starts."

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

matthewmaybe, 4 months ago

I'd be surprised if many AI practitioners found the preponderance of AI-generated content on the web to be a serious problem for training new models. Synthesizing training data using GPT-4 is an extremely popular approach for fine-tuning base LLMs. If people are intentionally poisoning training datasets with AI output, will they really care about data contamination?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

trochee, 4 months ago (edited 4 months ago) to random

The boom in LLMs is going to hollow out a number of knowledge-worker industries — for example, writing boilerplate code or technical documentation

Not because it does it well but because the flacks can sell upper management on the idea that it can do it at all, as @pluralistic recently pointed out

This sale is a pig-in-a-poke, and the winning move is to not be holding the bag when the actual code or documentation is found to be terrible

1/

reply

expand (66)

collapse (66)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ bkeegan, trochee, BradRubenstein, rticks +3 more

matthewmaybe, 4 months ago

@luke @trochee another one is the seemingly underrated A24 movie "After Yang", which I just watched last night. Not perfect, but worth checking out.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

carnage4life, 5 months ago to random

OpenAI has updated its policy and removed the ban on using its technology for warfare or military uses.

OpenAI is just an AI division of Microsoft at this point masquerading as a non-profit trying to develop AGI “for humanity”. It’s for Microsoft shareholders

https://theintercept.com/2024/01/12/open-ai-military-ban-chatgpt/

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Taffer, onepict, lin11c, qkslvrwolf +5 more

matthewmaybe, 5 months ago

@carnage4life I watched Collossus last night, an old sci-fi movie about (basically) an LLM being granted control of all national security during the Cold War. There are a lot of pretty well-trodden tropes but several aspects are on point, such as the hubris of the system's designer who believes he can tame it with prompt engineering. Oddly, it was recently taken down from all streaming services, but you can still watch it (for now) at the Internet Archive: https://archive.org/details/colossus-the-forbin-project-1970

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ matthewskelton

JohannesKleiner, 5 months ago to random German

Are there feeds yet for Mastodon?

reply

expand (7)

collapse (7)

report

activity

copy /kbin url

copy original url

open original url

Loading...

matthewmaybe, 5 months ago

@roaldarboel @JohannesKleiner @jonny I've been working on this as well; happy to collaborate or share ideas

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

404mediaco, 5 months ago to random

NEW: LAION, the gigantic dataset powering Stable Diffusion, several Google products, and other prominent AI tools has been taken down from the internet because it contains thousands of images of child sexual abuse material

https://www.404media.co/laion-datasets-removed-stanford-csam-child-abuse/

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ dgoldsmith, paezha, doomsdayrs, ParadeGrotesque +3 more

matthewmaybe, 5 months ago

@404mediaco “It's not that the technology is necessarily bad... it's not that AI is bad. It's the fact that a bunch of things were blindly stolen, and now we're trying to put all these Band-aids to fix something that really never should have happened in the first place.”

Amen. It's increasingly clear that researchers, companies and others operating in this space need to kick their addiction to Common Crawl and all the models and data based upon it (or similar to it).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

timnitGebru, 6 months ago to random

"What If Sam Altman Were A Black Woman? Tech Twitter Weighs In On The OpenAI Debacle"

I mean the dude who fired me was waxing poetic about the "small research community" and empathy for his colleagues and stuff so we know the answer.

https://peopleofcolorintech.com/articles/what-if-sam-altman-were-a-black-woman-tech-twitter-weighs-in-on-the-openai-debacle/

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov, Binder, cakeisnotalie, gzt +2 more

matthewmaybe, 6 months ago

@timnitGebru yes, and also I think it's significant that the OpenAI saga is another example where the white, male management of a tech company sought to stifle AI safety research published by a woman. It says a lot about where we're at with women in tech that he thought it was OK to try to force Helen Toner off the board for doing her job when technically as CEO he reported to her! And now he appears to have won that battle--even if there will be an "investigation..."

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ cakeisnotalie

timnitGebru, 9 months ago to random

Sigh. Good thing US legislators are meeting with all these leaders trying to hear from them on how they should be regulated 🙄
https://www.404media.co/ai-generated-mushroom-foraging-books-amazon/

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ danyork, trochee, rysiek, nunesgh +1 more

matthewmaybe, 9 months ago

@timnitGebru a very good example of why AI detection tools are needed. yes they could be better but the author of this article has demonstrated an excellent use case for existing ones

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

timnitGebru, 9 months ago to random

And another message. There was absolutely no need for this and we got the companies wanting to create an AGI god, putting these students in this awful situation. These "AI detection" tools need to be banned.

reply

expand (10)

collapse (10)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ glitzersachen, Archnemysis, dango_, villares +21 more

matthewmaybe, 9 months ago

@timnitGebru what should be done instead? and why ban the detectors but not the generators?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

themarkup, 9 months ago to random

The consequences of being wrongfully accused by an AI detector don't fall on groups evenly. We explain: https://themarkup.org/hello-world/2023/08/19/ai-detector-bias-and-higher-ed-coverage-at-the-markup

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ dgoldsmith, kentindell, ppatel, botwiki

matthewmaybe, 9 months ago

@themarkup "AI detectors work by searching for predictable phrasing, simple vocabulary, and less complex grammar" Where is the source for this? The AI detectors I am aware of use BERT and similar language models as classifier neural networks; we have no more of a clear idea what they are "searching for" than what GPT generators are "trying to say". Bias in AI detection is absolutely an issue, but not one researchers (such as Unitary) haven't been aware of for a long time.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

trochee, 10 months ago to random

At the library in the SF section and I found a book that looked promising and checks boxes for me (appears it isn't Heinlein wannabe pastiche; covers questions of autonomy, sentience and who counts as "people"; author is not a white straight cis anglophone dude*)

…but then it has a glowing blurb from Ray Kurzweil, and I just… put it back on the shelf.

No thank you.

I do read a few things by authors in this category, but I'm much less willing to give them* a try without recommendation

**us

reply

expand (6)

collapse (6)

report

activity

copy /kbin url

copy original url

open original url

Loading...

matthewmaybe, 10 months ago

@timnitGebru @trochee I feel like I hear relatively little said about Ray Kurzweil despite his apparent role as the godfather of #TESCREAL. Am I getting that wrong? Didn't he basically invent this new religion, and the rest are just preaching different versions of his gospel? Or did it exist before him in some form?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mekkaokereke, 10 months ago to random

Fantastic accomplishment!♥️👍🏿

https://19thnews.org/2023/07/chanda-prescod-weinstein-physicist-tenure-rare-feat/

And when she tried to join the Fediverse, she was greeted with a barrage of hate, sexism, racism, and anti-semitism that should have never been allowed to happen.

So now no one on Fediverse gets to interact with her directly about her work on here. Our loss. 😢

Which is why we'll make it so that this type of terrible welcome is unlikely to happen again. Allowing it to happen to her was a choice. We will make better ones.

#BlackMastodon

reply

expand (155)

collapse (155)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ juandesant, passthejoe, mivox, joycebell +41 more

matthewmaybe, 10 months ago

@b3n @techghoul @mekkaokereke @marjolica agreed that no human can tackle this alone, but in addition to shared action I think this has to involve bots of some sort. I've been working on one that scans a feed for toxic status updates (detected using a machine learning model) and reports them to admins. had gotten as far as thinking that after X toxic toots it might recommend banning a user but this could extend to instances.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

matthewmaybe, 10 months ago

@techghoul @mekkaokereke @b3n @marjolica I'm definitely aware of this problem. That's part of why I chose the Detoxify library for my proof-of-concept, as the authors put some thought and effort into countering such false positives with "unbiased" models. But I'm still seeing similar issues, and frankly, looking for a different approach. One thing that would really help is a dataset of reports and admin actions with context.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

botwiki, 11 months ago to random

Do you keep a list of all your bot projects somewhere online?

#bots #CreativeBots

reply

expand (6)

collapse (6)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ stefan, botwiki

matthewmaybe, 11 months ago

@botwiki sort of. i started one at https://umm-maybe.com/Bots.html but haven't kept it updated...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chrismessina, 11 months ago to random

The AI arms race is playing out in real time on the Product Hunt leaderboard.

Undetectio thwarts AI content detection:
https://www.producthunt.com/posts/undetectio

GptSafe detects AI content:
https://www.producthunt.com/posts/gptsafe

Judging by the upvotes, you can tell which side is winning. 😈

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ botwiki

matthewmaybe, 11 months ago

@chrismessina Time to make an Undetectio detector I guess

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mmitchell_ai, 1 year ago to random

CHECK IT OUT! You can now exclude unconsented images from your models. Next step: Norms+standards for opt-in (convos @huggingface is having with multiple orgs to create!) Data on right column of the Dataset Card!
https://huggingface.co/datasets/conceptual_captions

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ just_a_frog

matthewmaybe, 1 year ago

@huggingface @mmitchell_ai great idea. 👏 if the model card templates could encourage disclosure of the licensing status of training material that would also be helpful. I see many new models touted as "open" without discussing whether any steps were taken to filter out copyrighted works.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...