phillycodehound, to ai
@phillycodehound@masto.ai avatar

So over on Hacker News they report that Zoom is using user data to train their AI and there is no way to opt out. I mean there is a way... don't use Zoom. Though I'm going to keep using it. It's the best in class and pretty much everyone knows how to use it.

cragsand, (edited ) to ai

I learned how to train LoRa AI models using open source StableDiffusion...

For the purpose of recreating the appearance of my 3D VR roleplaying character the results I got were amazingly good... almost frighteningly so.

I'll go through the process, results and some thoughts.

🧵 part 1of4

PrivacyDigest, to ai
@PrivacyDigest@mas.to avatar

It’s a “fake PR stunt”: Artists hate Meta’s data deletion process | Ars Technica

This is a misconception. In reality, there is no functional way to opt out of Meta’s generative .

... In it, says it is “unable to process the request” until the requester submits evidence that their personal info appears in responses from Meta’s .

https://arstechnica.com/ai/2023/10/its-a-fake-pr-stunt-artists-hate-metas-ai-data-deletion-process/#p3

larkin, to random
@larkin@genart.social avatar
remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "The people who build generative AI have a huge influence on what it is good at, and who does and doesn’t benefit from it. Understanding how generative AI is shaped by the objectives, intentions, and values of its creators demystifies the technology, and helps us to focus on questions of accountability and regulation. In this explainer, we tackle one of the most basic questions: What are some of the key moments of human decision-making in the development of generative AI products? This question forms the basis of our current research investigation at Mozilla to better understand the motivations and values that guide this development process. For simplicity, let’s focus on text-generators like ChatGPT.

We can roughly distinguish between two phases in the production process of generative AI. In the pre-training phase, the goal is usually to create a Large Language Model (LLM) that is good at predicting the next word in a sequence (which can be words in a sentence, whole sentences, or paragraphs) by training it on large amounts of data. The resulting pre-trained model “learns” how to imitate the patterns found in the language(s) it was trained on.

This capability is then utilized by adapting the model to perform different tasks in the fine-tuning phase. This adjusting of pre-trained models for specific tasks is how new products are created. For example, OpenAI’s ChatGPT was created by “teaching” a pre-trained model — called GPT-3 — how to respond to user prompts and instructions. GitHub Copilot, a service for software developers that uses generative AI to make code suggestions, also builds on a version of GPT-3 that was fine-tuned on “billions of lines of code.”"

https://foundation.mozilla.org/en/blog/the-human-decisions-that-shape-generative-ai-who-is-accountable-for-what/

maggiemaybe, to privacy
joycebell, to ai
@joycebell@mas.to avatar

Authors are finding out that their books are being used to train AI without permission. https://authorsguild.org/news/you-just-found-out-your-book-was-used-to-train-ai-now-what/ #authors #copyright #AI #aitraining #aiethics

chris, to random
@chris@social.losno.co avatar

Cool, cool. hCaptcha, on this one specific EveryMac.com page, is now asking me to train AI for military vehicle identification. #aitraining #military https://everymac.com/ultimate-mac-lookup/?search_keywords=PowerBook2,1

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "Datasets are the building blocks of every AI generated image and text. Diffusion models break images in these datasets down into noise, learning how the images “diffuse.” From that information, the models can reassemble them. The models then abstract those formulas into categories using related captions, and that memory is applied to random noise, so as not to duplicate the actual content of training data, though it sometimes happens. An AI-generated image of a child is assembled from thousands of abstractions of these genuine photographs of children. In the case of Stable Diffusion and Midjourney, these images come from the LAION-5B dataset, a collection of captions and links to 2.3 billion images. If there are hundreds of images of a single child in that archive of URLs, that child could influence the outcomes of these models.

The presence of child pornography in this training data is obviously disturbing. An additional point of serious concern is the likelihood that images of children who experienced traumatic abuse are influencing the appearance of children in the resulting model’s synthetic images, even when those generated images are not remotely sexual.

The presence of this material in AI training data points to an ongoing negligence of the AI data pipeline. This crisis is partly the result of who policymakers talk with and allow to define AI: too often, it is industry experts who have a vested interest in deterring attention from the role of training data, and the facts of what lies within it. As with Omelas, we each face a decision of what to do now that we know these facts."

https://www.techpolicy.press/laion5b-stable-diffusion-and-the-original-sin-of-generative-ai/

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

#AI #GenerativeAI #UMichigan #LLMs #DataProtection #Privacy #AITraining #HigherEd #Universities #USA: "The University of Michigan is selling hours of audio recordings of study groups, office hours, lectures, and more to outside third-parties for tens of thousands of dollars for the purpose of training large language models (LLMs). 404 Media has downloaded a sample of the data, which includes a one hour and 20 minute long audio recording of what appears to be a lecture.

The news highlights how some LLMs may ultimately be trained on data with an unclear level of consent from the source subjects. The University of Michigan did not immediately respond to a request for comment, and neither did Catalyst Research Alliance, which is part of the sale process.

“The University of Michigan has recorded 65 speech events from a wide range of academic settings, including lectures, discussion sections, interviews, office hours, study groups, seminars and student presentations,” a page on Catalyst’s website about the University of Michigan data reads. “Speakers represent broad demographics, including male and female and native and non-native English speakers from a wide variety of academic disciplines.”"

https://www.404media.co/university-of-michigan-sells-recordings-of-study-groups-and-office-hours-to-train-ai/

larkin, to random
@larkin@genart.social avatar
larkin, to random
@larkin@genart.social avatar
remixtures, to Bulgaria Portuguese
@remixtures@tldr.nettime.org avatar

#EU #Belgium #France #AI #GenerativeaAI #AITraining #DataProtection #GDPR: "As well as the Belgian Data Protection Authority decision I criticised earlier this week, it appears the French DPA has issued similar guidance on the use of personal data to train AI models. My detailed analysis below shows that, in relation to purpose-specific AI systems, it makes no sense: the training of the system cannot be separated from the ultimate purpose of the system. This has a major bearing on the issue of compatibility.

As a matter of principle and law, the creation and training of AI models/profiles for a specific purpose (be that direct marketing or health care) must be based on the legal basis relied on for that ultimate purpose.

The fact that the creation and training of the models/profiles is a “first phase” in a two-phase process (with the deployment of the models/profiles forming the “second phase”) does not alter that.

However, as an exception to this, under the GDPR, the processing can also be authorised by law or by means of an authorisation issued by a DPA under the relevant law (as in France), provided the law or DPA authorisation lays down appropriate safeguards. That is the only qualification I accept to the above principle." https://www.ianbrown.tech/2024/04/16/more-on-french-and-belgian-gdpr-guidance-on-ai-training/

emkingma, to ai
@emkingma@mstdn.social avatar

Go on LinkedIn for a bit this morning and I'm greeted with a message and an ad inviting me to screw over my own future and that of others.

No, I'm not going to teach your generative AI model how to f**king write.

#AI #AITraining #GenerativeAI

An ad from Outlier that appeared in my LinkedIn feed, encouraging me to sign up for the role I was messaged about.

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "A lawsuit is alleging Amazon was so desperate to keep up with the competition in generative AI it was willing to breach its own copyright rules.…

The allegation emerges from a complaint [PDF] accusing the tech and retail mega-corp of demoting, and then dismissing, a former high-flying AI scientist after it discovered she was pregnant.

The lawsuit was filed last week in a Los Angeles state court by Dr Viviane Ghaderi, an AI researcher who says she worked successfully in Amazon's Alexa and LLM teams, and achieved a string of promotions, but claims she was later suddenly demoted and fired following her return to work after giving birth. She is alleging discrimination, retaliation, harassment and wrongful termination, among other claims.

Montana MacLachlan, an Amazon spokesperson, said of the suit: "We do not tolerate discrimination, harassment, or retaliation in our workplace. We investigate any reports of such conduct and take appropriate action against anyone found to have violated our policies.""

https://www.msn.com/en-us/news/crime/ex-amazon-exec-claims-she-was-asked-to-break-copyright-law-in-race-to-ai/ar-AA1nrNEG

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "[A]s the lawsuits and investigations around generative AI and its opaque data practices pile up, there have been small moves to give people more control over what happens to what they post online. Some companies now let individuals and business customers opt out of having their content used in AI training or being sold for training purposes. Here’s what you can—and can’t—do.

Before we get to how you can opt out, it’s worth setting some expectations. Many companies building AI have already scraped the web, so anything you’ve posted is probably already in their systems. Companies are also secretive about what they have actually scraped, purchased, or used to train their systems. “We honestly don't know that much,” says Niloofar Mireshghallah, a researcher who focuses on AI privacy at the University of Washington. “In general, everything is very black-box.”" https://www.wired.com/story/how-to-stop-your-data-from-being-used-to-train-ai/

remixtures, to ArtificialIntelligence Portuguese
@remixtures@tldr.nettime.org avatar

: "Roboticists believe that by using new AI techniques, they will achieve something the field has pined after for decades: more capable robots that can move freely through unfamiliar environments and tackle challenges they’ve never seen before.
(...)
But something is slowing that rocket down: lack of access to the types of data used to train robots so they can interact more smoothly with the physical world. It’s far harder to come by than the data used to train the most advanced AI models like GPT—mostly text, images, and videos scraped off the internet. Simulation programs can help robots learn how to interact with places and objects, but the results still tend to fall prey to what’s known as the “sim-to-real gap,” or failures that arise when robots move from the simulation to the real world.

For now, we still need access to physical, real-world data to train robots. That data is relatively scarce and tends to require a lot more time, effort, and expensive equipment to collect. That scarcity is one of the main things currently holding progress in robotics back."

https://www.technologyreview.com/2024/04/30/1091907/the-robot-race-is-fueling-a-fight-for-training-data/

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "Stack Overflow, a legendary internet forum for programmers and developers, is coming under heavy fire from its users after it announced it was partnering with OpenAI to scrub the site's forum posts to train ChatGPT. Many users are removing or editing their questions and answers to prevent them from being used to train AI — decisions which have been punished with bans from the site's moderators.

Stack Overflow user Ben posted on Mastodon about his experience editing his most successful answers to try to avoid having his work stolen by OpenAI.

@ben on Mastodon posts, "Stack Overflow announced that they are partnering with OpenAI, so I tried to delete my highest-rated answers. Stack Overflow does not let you delete questions that have accepted answers and many upvotes because it would remove knowledge from the community. So instead I changed my highest-rated answers to a protest message. Within an hour mods had changed the questions back and suspended my account for 7 days."

Ben continues in his thread, "[The moderator crackdown is] just a reminder that anything you post on any of these platforms can and will be used for profit. It's just a matter of time until all your messages on Discord, Twitter etc. are scraped, fed into a model and sold back to you."

https://www.tomshardware.com/tech-industry/artificial-intelligence/stack-overflow-bans-users-en-masse-for-rebelling-against-openai-partnership-users-banned-for-deleting-answers-to-prevent-them-being-used-to-train-chatgpt

remixtures, to Sony Portuguese
@remixtures@tldr.nettime.org avatar

Sony Music is the prototype of the company that uses artists as mere puppets for getting the only thing it really wants: free money extracted through IP rents. It's a parasite that doesn't contribute at all to the promotion of arts and science.

: "Sony Music is sending warning letters to more than 700 artificial intelligence developers and music streaming services globally in the latest salvo in the music industry’s battle against tech groups ripping off artists.

The Sony Music letter, which has been seen by the Financial Times, expressly prohibits AI developers from using its music — which includes artists such as Harry Styles, Adele and Beyoncé — and opts out of any text and data mining of any of its content for any purposes such as training, developing or commercialising any AI system.

Sony Music is sending the letter to companies developing AI systems including OpenAI, Microsoft, Google, Suno and Udio, according to those close to the group.

The world’s second-largest music group is also sending separate letters to streaming platforms, including Spotify and Apple, asking them to adopt “best practice” measures to protect artists and songwriters and their music from scraping, mining and training by AI developers without consent or compensation. It has asked them to update their terms of service, making it clear that mining and training on its content is not permitted.

Sony Music declined to comment further."

https://www.ft.com/content/c5b93b23-9f26-4e6b-9780-a5d3e5e7a409

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "Creating an individual bargainable copyright over training will not improve the material conditions of artists' lives – all it will do is change the relative shares of the value we create, shifting some of that value from tech companies that hate us and want us to starve to entertainment companies that hate us and want us to starve.

As an artist, I'm foursquare against anything that stands in the way of making art. As an artistic worker, I'm entirely committed to things that help workers get a fair share of the money their work creates, feed their families and pay their rent.

I think today's AI art is bad, and I think tomorrow's AI art will probably be bad, but even if you disagree (with either proposition), I hope you'll agree that we should be focused on making sure art is legal to make and that artists get paid for it.

Just because copyright won't fix the creative labor market, it doesn't follow that nothing will. If we're worried about labor issues, we can look to labor law to improve our conditions."

https://pluralistic.net/2024/05/13/spooky-action-at-a-close-up/#invisible-hand

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "It all kicked off last night, when a note on Hacker News raised the issue of how Slack trains its AI services, by way of a straight link to its privacy principles — no additional comment was needed. That post kicked off a longer conversation — and what seemed like news to current Slack users — that Slack opts users in by default to its AI training, and that you need to email a specific address to opt out.

That Hacker News thread then spurred multiple conversations and questions on other platforms: There is a newish, generically named product called “Slack AI” that lets users search for answers and summarize conversation threads, among other things, but why is that not once mentioned by name on that privacy principles page in any way, even to make clear if the privacy policy applies to it? And why does Slack reference both “global models” and “AI models?”

Between people being confused about where Slack is applying its AI privacy principles, and people being surprised and annoyed at the idea of emailing to opt-out — at a company that makes a big deal of touting that “Your control your data” — Slack does not come off well."

https://techcrunch.com/2024/05/17/slack-under-attack-over-sneaky-ai-training-policy/?guccounter=1

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "In this conversation, we discuss how Herndon collaborated with a human chorus and her “A.I. baby,” Spawn, on “PROTO”; how A.I. voice imitators grew out of electronic music and other musical genres; why Herndon prefers the term “collective intelligence” to “artificial intelligence”; why an “opt-in” model could help us retain more control of our work as A.I. trawls the internet for data; and much more."

https://www.youtube.com/watch?v=4MJ2D9uCLLA

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

Surprise, surprise: News publishers only care about Money!!

: "Publishers are deep in negotiations with tech firms such as OpenAI to sell their journalism as training for the companies’ models. It turns out that accurate, well-written news is one of the most valuable sources for these models, which have been hoovering up humans’ intellectual output without permission. These AI platforms need timely news and facts to get consumers to trust them. And now, facing the threat of lawsuits, they are pursuing business deals to absolve them of the theft. These deals amount to settling without litigation. The publishers willing to roll over this way aren’t just failing to defend their own intellectual property—they are also trading their own hard-earned credibility for a little cash from the companies that are simultaneously undervaluing them and building products quite clearly intended to replace them."

https://www.theatlantic.com/technology/archive/2024/05/fatal-flaw-publishers-making-openai-deals/678477/

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "- Merely relying on the disclosure of statistical accuracy of the GenAI model is insufficient, since it could lead to an “Accuracy Paradox”. It refers to the unintended consequences of solely relying on the disclosure of a model’s statistical accuracy, which can lead to a misleading sense of reliability among users. As accuracy metrics improve, users may overly trust the AI outputs without sufficient verification, increasing the risk of accepting erroneous information.

  • Increasing the accuracy of inputs, models, and outputs often comes with the cost of privacy, especially in GenAI context. This involves not only technical identifiability of the individuals involved, but also societal risks such as more accurate and precise targeting for commercial purposes, social sorting, and group privacy implications.
  • Overreliance on developers’ and deployers’ accuracy legal compliance is not pragmatic and is overoptimistic, which could ultimately become a burden for users with the tendency of using dark pattern. In this context, GenAI developers and deployers could use such manipulative design to shift the responsibility for data accuracy onto users.
  • We argue that content moderation as a tool to mitigate inaccuracy and untrustworthiness. As a critical role in ensuring the accuracy, reliability, and trustworthiness of GenAI, content moderation could filter flawed or harmful content, which involves refining detection methods to distinguish and exclude incorrect or misleading information from training data and model outputs.
  • Accuracy of training data cannot directly translate to the accuracy of output, especially in the context of hallucination. Even though most training data is reliable and trustworthy, the essential issue remains that the recombination of trustworthy data into new answers in a new context may lead to untrustworthiness..."

https://www.create.ac.uk/blog/2024/05/28/accuracy-of-training-data-and-model-outputs-in-generative-ai-create-response-to-the-information-commissioners-office-ico-consultation/

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "Large publishers are forging ahead with voluntary agreements in the absence of legal regulatory clarity. But this leaves out smaller and local publishers and could undermine efforts to develop business model alternatives as opposed to one-off licensing opportunities.

Ad hoc approaches, however, risk worsening the compounding crises caused by the decline of local news and the scourge of disinformation. We are already seeing the proliferation of election related disinformation in the U.S. and around the world, from AI robocalls impersonating President Joe Biden to deepfakes of Moldovan candidates making false claims about alignment with Russia.

Renegotiating the relationship between tech platforms and the news industry must be a fundamental part of the efforts to support journalism and help news organizations adapt to the generative AI era."

https://niemanreports.org/articles/the-battle-over-using-journalism-to-build-ai-models-is-just-starting/

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • ethstaker
  • mdbf
  • InstantRegret
  • magazineikmin
  • thenastyranch
  • rosin
  • everett
  • khanakhh
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • DreamBathrooms
  • JUstTest
  • GTA5RPClips
  • osvaldo12
  • modclub
  • Leos
  • tacticalgear
  • ngwrru68w68
  • cubers
  • cisconetworking
  • normalnudes
  • anitta
  • tester
  • provamag3
  • lostlight
  • All magazines