larkin, to random
@larkin@genart.social avatar
larkin, to random
@larkin@genart.social avatar
remixtures, to random Portuguese
@remixtures@tldr.nettime.org avatar

RT @katecrawford
✨Just published: New @ISSUESinST collection of STS scholars addressing the urgent governance challenges of generative AI. This came from our year-long @ENS_ULM working group, where we each focused on one core problem:

https://issues.org/an-ai-society/

remixtures,
@remixtures@tldr.nettime.org avatar

: "Copyright law was developed by eighteenth-century capitalists to intertwine art with commerce. In the twenty-first century, it is being used by technology companies to allow them to exploit all the works of human creativity that are digitized and online. But the destabilization around generative AI is also an opportunity for a more radical reassessment of the social, legal, and cultural frameworks underpinning creative production.
(...)
It may be time to develop concepts of intellectual property with a stronger focus on equity and creativity as opposed to economic incentives for media corporations. We are seeing early prototypes emerge from the recent collective bargaining agreements for writers, actors, and directors, many of whom lack copyrights but are nonetheless at the creative core of filmmaking. The lessons we learn from them could set a powerful precedent for how to pluralize intellectual property. Making a better world will require a deeper philosophical engagement with what it is to create, who has a say in how creations can be used, and who should profit." https://issues.org/generative-ai-copyright-law-crawford-schultz/

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

AI TRAINING = FAIR USE

: "The datasets on which GAI systems like ChatGPT and Stable Diffusion rely are more like Google Books than commercial web crawlers that gather data; GAI systems need writing and art in its complete form to train from. And the comparison to Google Books doesn’t stop there. Many GAI systems are built using third-party public datasets like Common Crawl and LAION that are distributed under fair use principles and provide a real public good: archiving and making accessible the aggregated content of the internet for academics, researchers, and anyone else that may want it. These are free, non-commercial datasets collected by nonprofit organizations for use by researchers and the public. Web crawling and scraping also underlie the operation of search engines and archiving projects like the Internet Archive’s popular Wayback Machine.

In other words, the same practices that go into collecting data for GAI training are currently understood to be non-infringing or protected by fair use. Considering how vital these practices are for an open and accessible internet, we should ensure that they stay that way.

As a threshold matter, it is critical to understand that accessing, linking to, or interacting with digital information does not infringe any copyright. Reading a book, looking at a photograph, admiring a painting, or listening to music is not, and never should be, copyright infringement. This is not a “fair use” issue; the ability to use, access, or interact with a creative work is outside a copyright owner’s scope of control. Based on the best explanations of how GAI systems work, training a GAI system is generally analogous to these kinds of uses."

https://publicknowledge.org/generative-ai-is-disruptive-but-more-copyright-isnt-the-answer/

larkin, to random
@larkin@genart.social avatar
remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

#AI #GenerativeAI #GeneratedImages #Copyright #AITraining #IP: "Since the emergence of Midjourney and other image generators, artists have been watching and wondering whether AI is a great opportunity or an existential threat. Now, after a list of 16,000 names emerged of artists whose work Midjourney had allegedly used to train its AI – including Bridget Riley, Damien Hirst, Rachel Whiteread, Tracey Emin, David Hockney and Anish Kapoor – the art world has issued a call to arms against the technologists.

British artists have contacted US lawyers to discuss joining a class action against Midjourney and other AI firms, while others have told the Observer that they may bring their own legal action in the UK.

“What we need to do is come together,” said Tim Flach, president of the Association of Photographers and an internationally acclaimed photographer whose name is on the list.

“This public showing of this list of names is a great catalyst for artists to come together and challenge it. I personally would be up for doing that.”

The 24-page list of names forms Exhibit J in a class action brought by 10 American artists in California against Midjourney, Stability AI, Runway AI and DeviantArt. Matthew Butterick, one of the lawyers representing the artists, said: “We’ve had interest from artists around the world, including the UK.”

The tech firms have until 8 February to respond to the claim. Midjourney did not respond to requests for comment."

https://www.theguardian.com/technology/2024/jan/21/we-need-to-come-together-british-artists-team-up-to-fight-ai-image-generating-software

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "We have previously analysed US class actions against Open AI (here) and Google (here) for unauthorized use of copyright works in the training of generative AI tools, respectively ChatGPT, Google Bard and Gemini. To further develop this excursus on the US case law, in this post we consider two recent class actions against Meta launched by copyright holders (mainly book authors), for alleged infringement of IP in their books and written works through use in training materials for LLaMA (Large Language Model Meta AI). Such case law is interesting for the reconstruction of the technology deployed by Meta and the training methodology (at least from the plaintiff’s perspective) but also because the court has had the chance to preliminarily evaluate the robustness of the claims. Given the similarity of the legal arguments and the same technology being at stake (Meta’s LLaMA), upon the request of the parties, the Court treated the two class actions jointly (here)."

https://copyrightblog.kluweriplaw.com/2024/01/17/generative-ai-admissibility-and-infringement-in-the-two-us-class-actions-against-metas-llama/

remixtures, to meta Portuguese
@remixtures@tldr.nettime.org avatar

#Meta #AI #GenerativeAI #Copyright #Llama #LLMs #AITraining #Piracy: "These are noteworthy developments but not all complaints can be resolved with promises. Several lawsuits against OpenAI and Meta remain ongoing, accusing the companies of using the Books3 dataset to train their models.

While OpenAI and Meta are very cautious about discussing the subject in public, Meta provided more context in a California federal court this week.

Responding to a lawsuit from writer/comedian Sarah Silverman, author Richard Kadrey, and other rights holders, the tech giant admits that “portions of Books3” were used to train the Llama AI model before its public release.

“Meta admits that it used portions of the Books3 dataset, among many other materials, to train Llama 1 and Llama 2,” Meta writes in its answer."

https://torrentfreak.com/meta-admits-use-of-pirated-book-dataset-to-train-ai-240111/

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "Datasets are the building blocks of every AI generated image and text. Diffusion models break images in these datasets down into noise, learning how the images “diffuse.” From that information, the models can reassemble them. The models then abstract those formulas into categories using related captions, and that memory is applied to random noise, so as not to duplicate the actual content of training data, though it sometimes happens. An AI-generated image of a child is assembled from thousands of abstractions of these genuine photographs of children. In the case of Stable Diffusion and Midjourney, these images come from the LAION-5B dataset, a collection of captions and links to 2.3 billion images. If there are hundreds of images of a single child in that archive of URLs, that child could influence the outcomes of these models.

The presence of child pornography in this training data is obviously disturbing. An additional point of serious concern is the likelihood that images of children who experienced traumatic abuse are influencing the appearance of children in the resulting model’s synthetic images, even when those generated images are not remotely sexual.

The presence of this material in AI training data points to an ongoing negligence of the AI data pipeline. This crisis is partly the result of who policymakers talk with and allow to define AI: too often, it is industry experts who have a vested interest in deterring attention from the role of training data, and the facts of what lies within it. As with Omelas, we each face a decision of what to do now that we know these facts."

https://www.techpolicy.press/laion5b-stable-diffusion-and-the-original-sin-of-generative-ai/

rant.vpalepu.com, to Japan

Tech regulations are going to be an important thing to watch out for in 2024. So, this year, I am going to keep documenting news about tech regulations that show up on my news feeds.

Two things popped up today:

  1. Japan: Copyright and AI Training

A story from last year about Japan’s stance on AI training and data copyright seems to have caught wind on Hacker News today. The original news seems to by technomancers.ai, but here is the regurgitated version of the story on ACM’s news site (archived link from last year):

https://winterrant.files.wordpress.com/2024/01/screenshot-2024-01-02-at-3.00.32e280afpm.png?w=1024Japanese publishers already seem to be up in arms about this:

“The Japan Newspaper Publishers & Editors Association and three other industry groups released a joint statement Thursday expressing concern that copyright protection is not being adequately considered in the development of generative artificial intelligence.

The other organizations are the Japan Magazine Publishers Association, the Japan Photographic Copyright Association and the Japan Book Publishers Association.

In the joint statement, the organizations said that current generative AI creates content based on the analysis of large amounts of data collected from the internet without the consent of and payments to copyright holders.”

https://www.japantimes.co.jp/news/2023/08/17/japan/crime-legal/japan-publisher-ai-copyright-concern/ (archived link)

  1. Montana and North Carolina: Internet Identity

New Internet identification laws went into effect on Jan 1, 2024 in Montana and N.C.

“[…] laws that went into effect in both states on January 1st. Montana passed a standalone ID verification law in May, and North Carolina’s new law was tacked onto a bill regarding the high school computer curriculum. The laws require sites to either use third-party verification or, in the case of Montana, “digitized identification” to verify a visitor’s age. Both states also leave enforcement as a civil matter, allowing individuals to sue if they think a site violates the law.”

https://www.theverge.com/2024/1/2/24022539/pornhub-blocked-montana-north-carolina-age-verification-law-protest (archived link)

These laws and many others like them are starting to require ID verification before accessing sites on the Internet. While these laws will have an outsized impact on porn hosting websites, their impact will likely be felt by other internet services that may have restrictions around how children use sites and services on the internet.

While the laws are well-meaning and well-intentioned, it is unclear how they will not violate user privacy. If the idea is to protect children, and affirm every user’s age through a well established digital (or physical) identity, then such sensitive identity data will need to make its way through the internet and reside on some web server (or data center). If such identity data ever leaks, it will be a major headache for the impacted users.

UK has passed a similar law that tightens its existing regulations around internet identities and protecting children on the internet. I am unclear on the status of that law: not sure if it has gone into effect, or if there are revisions to be made to it.

https://rant.vpalepu.com/2024/01/02/tech-regulations-update-japan-montana-north-carolina/

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

#AI #CC #AITraining #CreativeCommons: "Creative Commons has been used for around 20 years, and the number of lawsuits involving works released under these licences has been minimal. People choose to share their works with CC licenses for various reasons, some are selfish, some altruistic, and some pragmatic. Personally, I have always enjoyed sharing. Since I don’t anticipate earning money from my writing, I prefer making my works freely available with minimal restrictions. CC licenses facilitate this by signalling to others that they can share my work. However, this philosophy might not resonate with everyone. For those who do not share this view, CC may not be the ideal choice. If you prefer not to have your works widely shared, avoiding open licenses and utilizing technical tools and opt-outs might be a better approach. Respecting individual preferences will be crucial moving forward. I believe we are approaching a landscape similar to what we have seen with open access and open content, where such considerations are increasingly significant."

https://www.technollama.co.uk/creative-commons-and-ai-training

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "The agency cautioned that generative AI could mimic “artists’ faces, voices, and performances without permission,” deceiving consumers about a work’s true authorship. FTC officials also expressed concerns about copyright violations, stating AI systems are trained on “pirated content” scraped “without consent.”

On copyright infringement, the FTC stated that “the use of pirated or misuse of copyrighted materials could be an unfair practice or unfair method of competition under Section 5 of the FTC Act.”

Separately but relatedly, leading AI companies such as OpenAI and Anthropic are facing lawsuits accusing them of violating copyright by using copyrighted content in their training data."

https://venturebeat.com/ai/ftc-takes-shots-at-ai-in-rare-filing-to-us-copyright-office/

chris, to random
@chris@social.losno.co avatar

Cool, cool. hCaptcha, on this one specific EveryMac.com page, is now asking me to train AI for military vehicle identification. https://everymac.com/ultimate-mac-lookup/?search_keywords=PowerBook2,1

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

#AI #GenerativeAI #LLMs #AITraining #Copyright #IP: "Web crawlers and scrapers can easily access data from just about anywhere that’s not behind a login page. Social media profiles set to private aren’t included. But data that are viewable in a search engine or without logging into a site, such as a public LinkedIn profile, might still be vacuumed up, Dodge says. Then, he adds, “there’s the kinds of things that absolutely end up in these Web scrapes”—including blogs, personal webpages and company sites. This includes anything on popular photograph-sharing site Flickr, online marketplaces, voter registration databases, government webpages, Wikipedia, Reddit, research repositories, news outlets and academic institutions. Plus, there are pirated content compilations and Web archives, which often contain data that have since been removed from their original location on the Web. And scraped databases do not go away. “If there was text scraped from a public website in 2018, that’s forever going to be available, whether [the site or post has] been taken down or not,” Dodge notes."

https://www-scientificamerican-com.cdn.ampproject.org/c/s/www.scientificamerican.com/article/your-personal-information-is-probably-being-used-to-train-generative-ai-models/?amp=true

PrivacyDigest, to ai
@PrivacyDigest@mas.to avatar

It’s a “fake PR stunt”: Artists hate Meta’s data deletion process | Ars Technica

This is a misconception. In reality, there is no functional way to opt out of Meta’s generative .

... In it, says it is “unable to process the request” until the requester submits evidence that their personal info appears in responses from Meta’s .

https://arstechnica.com/ai/2023/10/its-a-fake-pr-stunt-artists-hate-metas-ai-data-deletion-process/#p3

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

#AI #AITraining #Hardware #EnergyConsumption: "Like many data centers, the LLSC has seen a significant uptick in the number of AI jobs running on its hardware. Noticing an increase in energy usage, computer scientists at the LLSC were curious about ways to run jobs more efficiently. Green computing is a principle of the center, which is powered entirely by carbon-free energy.

Training an AI model — the process by which it learns patterns from huge datasets — requires using graphics processing units (GPUs), which are power-hungry hardware. As one example, the GPUs that trained GPT-3 (the precursor to ChatGPT) are estimated to have consumed 1,300 megawatt-hours of electricity, roughly equal to that used by 1,450 average U.S. households per month.

While most people seek out GPUs because of their computational power, manufacturers offer ways to limit the amount of power a GPU is allowed to draw. "We studied the effects of capping power and found that we could reduce energy consumption by about 12 percent to 15 percent, depending on the model," Siddharth Samsi, a researcher within the LLSC, says."

https://news.mit.edu/2023/new-tools-available-reduce-energy-that-ai-models-devour-1005

joycebell, to ai
@joycebell@mas.to avatar
joycebell, to ai
@joycebell@mas.to avatar

Authors are finding out that their books are being used to train AI without permission. https://authorsguild.org/news/you-just-found-out-your-book-was-used-to-train-ai-now-what/

maggiemaybe, to privacy
SteveThompson, to ai
@SteveThompson@mastodon.social avatar
rexi, to machinelearning
@rexi@mastodon.social avatar

https://www.pbs.org/video/ai-protection-1693683970/

"…you give it a large set of information and you ask it to detect a certain pattern…program can improve its performance based on number of trials and number of times, hence the term .

The crux of this matter when it comes to the and strike is that the large sets of data come from the that writers and actors have generated, and they have not been compensated for any that has been done on that data…"

pmj, to ai
@pmj@social.pmj.rocks avatar

stop using immediately!!
they basically steal your personality, your manners, your gestures to make money out of it!
this is waaay beyond text or images!
and you can't opt-out!
https://zoomai.info/

gianmarcogg03, to ai

One more reason to ditch : they changed their ToS to pretend they won't take user data to train third-party AIs, even though the rest of the ToS pretty much says they can do that and beyond. Sounds like the usual Google/Apple "backing down but not really" stunt.

https://www.computerworld.com/article/3704489/zoom-goes-for-a-blatant-genai-data-grab-enterprises-beware.html

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "The people who build generative AI have a huge influence on what it is good at, and who does and doesn’t benefit from it. Understanding how generative AI is shaped by the objectives, intentions, and values of its creators demystifies the technology, and helps us to focus on questions of accountability and regulation. In this explainer, we tackle one of the most basic questions: What are some of the key moments of human decision-making in the development of generative AI products? This question forms the basis of our current research investigation at Mozilla to better understand the motivations and values that guide this development process. For simplicity, let’s focus on text-generators like ChatGPT.

We can roughly distinguish between two phases in the production process of generative AI. In the pre-training phase, the goal is usually to create a Large Language Model (LLM) that is good at predicting the next word in a sequence (which can be words in a sentence, whole sentences, or paragraphs) by training it on large amounts of data. The resulting pre-trained model “learns” how to imitate the patterns found in the language(s) it was trained on.

This capability is then utilized by adapting the model to perform different tasks in the fine-tuning phase. This adjusting of pre-trained models for specific tasks is how new products are created. For example, OpenAI’s ChatGPT was created by “teaching” a pre-trained model — called GPT-3 — how to respond to user prompts and instructions. GitHub Copilot, a service for software developers that uses generative AI to make code suggestions, also builds on a version of GPT-3 that was fine-tuned on “billions of lines of code.”"

https://foundation.mozilla.org/en/blog/the-human-decisions-that-shape-generative-ai-who-is-accountable-for-what/

deltatux, to privacy

With the uproar on social media over Zoom's recent privacy policy changes, the company tries to reassure users about what these changes mean. They go on to say:

To reiterate: we do not use audio, video, or chat content for training our models without customer consent.

#Zoom #privacy #AITraining #webconference

https://blog.zoom.us/zooms-term-service-ai/

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • InstantRegret
  • mdbf
  • ethstaker
  • magazineikmin
  • cubers
  • rosin
  • thenastyranch
  • Youngstown
  • osvaldo12
  • slotface
  • khanakhh
  • kavyap
  • DreamBathrooms
  • provamag3
  • Durango
  • everett
  • tacticalgear
  • modclub
  • anitta
  • cisconetworking
  • tester
  • ngwrru68w68
  • GTA5RPClips
  • normalnudes
  • megavids
  • Leos
  • lostlight
  • All magazines