404 Media reports that "Largest Dataset Powering AI Images Removed After Discovery of Child Sexual Abuse Material" 🧵
However, in 2021, a preprint by @abebab, Vinay Uday Prabhu & Emmanuel Kahembwe found a number issues in the dataset including " troublesome and explicit images and text pairs of rape, pornography, malign stereotypes, racist and ethnic slurs, and other extremely problematic content."
@ed@DAIR@abebab I would be surprised to learn that there is a patching culture for datasets like LAION.
The story shared by 404 shows that dataset maintenance standards are badly needed. I think it’s also a cultural change that’s needed: from a culture of data dumps to one of data care