#OpenAI partners with the American Journalism Project to access content to train its #AI model in exchange for $5M in funding and $5M in developer credits.
"It's not lying, it's not telling the truth, because both of those would require some intentionality and some communicative intent, which it doesn't have."
This excellent comic on the history of Luddism by Tom Humberstone https://thenib.com/im-a-luddite led me to this site with folks developing 'Glaze' which is a thing artists are using to mess with AI trying to scrape their art. Check it out here:
Since #healthcare execs are salivating over #LLM : If you deploy it in 100M patient encounters , that's 100,000 law-suits under an anticonservative error rate of 0.1% (see linked tweet).
Factoids:
*There are ~120M encounters in the US annually
*~20k lawsuits are filed annually in the US
FDA wouldn't license a device with 0.1% failure rate
*medical device manufacturers cannot use disclaimers to get themselves out of lawsuits.
A few absolute shockers in the list of websites the Washington Post has revealed are used to train Google's generative AI tools. Apparently including the likes of 4Chan, Breitbart, and RT.
From WaPo:
"Meanwhile, we found several media outlets that rank low on NewsGuard’s independent scale for trustworthiness: RT.com No. 65, the Russian state-backed propaganda site; breitbart.com No. 159, a well-known source for far-right news and opinion; and vdare.com No. 993, an anti-immigration site that has been associated with white supremacy.
"The top Christian site, Grace to You (gty.org No. 164), belongs to Grace Community Church, an evangelical megachurch in California. Christianity Today recently reported that the church counseled women to 'continue to submit' to abusive fathers and husbands and to avoid reporting them to authorities."
Anyone interested in services like #ChatGPT should check out the new #GenerativeAI service released to the public today. It appears to take a different conversational approach to the same chat concept as we've seen before. It's called PI and you can find it at the address below. It has a voice that responds back if you want to turn it on. Four different voices to choose from. As worried as I am, this stuff fascinates me. This thing is free for now.
not that the world really needs more computer science conferences, but i keep wondering if there’s appetite for one focused on procedural generation/generative computation, i.e. the union of PCG, generative art, program synthesis, &c.
mostly, it would really help to have a name for this field that people don’t mistake for consisting entirely of text2image statistical models
"We were supposed to research #enshittification, not embrace it as a business model!" implored the DVC Research.
The Vice-Chancellor sighed audibly and exhaled.
"We're out of options."
She raised her hands, palms up, reminiscent of prayer.
"The research grants don't cover the research we do, much less the research we want to do.
International students have declined 20% year on year since India, China and Indonesia have on-shore partnerships with Deakin and Monash that still get the grads a permanent residency.
We have PhDs teaching most of the undergrad courses. The endowment took a major hit when the stock market crashed in '25.
Federation's gone bust, Adelaide's half the size it was before the merger, and you've seen CQ merge with SCU and James Cook and Charles Darwin just to be viable."
She took a sharp inhalation of burnt autumn air.
"It's tens of millions a year in recurring revenue. That's a School's worth of people."
#AI#GenerativeAI#UMichigan#LLMs#DataProtection#Privacy#AITraining#HigherEd#Universities#USA: "The University of Michigan is selling hours of audio recordings of study groups, office hours, lectures, and more to outside third-parties for tens of thousands of dollars for the purpose of training large language models (LLMs). 404 Media has downloaded a sample of the data, which includes a one hour and 20 minute long audio recording of what appears to be a lecture.
The news highlights how some LLMs may ultimately be trained on data with an unclear level of consent from the source subjects. The University of Michigan did not immediately respond to a request for comment, and neither did Catalyst Research Alliance, which is part of the sale process.
“The University of Michigan has recorded 65 speech events from a wide range of academic settings, including lectures, discussion sections, interviews, office hours, study groups, seminars and student presentations,” a page on Catalyst’s website about the University of Michigan data reads. “Speakers represent broad demographics, including male and female and native and non-native English speakers from a wide variety of academic disciplines.”"
“If #hallucinations aren’t fixable, #generativeAI probably isn’t going to make a trillion dollars a year. And if it probably isn’t going to make a trillion dollars a year, it probably isn’t going to have the impact people seem to be expecting. And if it isn’t going to have that impact, maybe we should not be building our world around the premise that it is” @garymarcus
#AI#GenerativeAI#LLMs#AITraining#GeneratedImages#Copyright#IP: "A lot of early AI research was done in an academic setting; the law specifically mentions teaching, scholarship, and research as examples of fair use. As a result, the machine-learning community has traditionally taken a relaxed attitude toward copyright. Early training sets frequently included copyrighted material.
As academic researchers took jobs in the burgeoning commercial AI sector, many assumed they would continue to enjoy wide latitude to train on copyrighted material. Some feel blindsided by copyright holders’ demands for cash.
“We all learn for free,” Daniel Jeffries wrote in his tweet summing up the view of many in the AI community. “We learn from the world around us and so do machines.”
The argument seems to be that if it’s legal for a human being to learn from one copyrighted book, it must also be legal for a large language model to learn from a million copyrighted books—even if the training process requires making copies of the books.
As MP3.com and Texaco learned, this isn't always true. A use that’s fair at a small scale can be unfair when it’s scaled up and commercialized.
But AI advocates like Jeffries are right that sometimes it is true. There are cases where courts have held that bulk technological uses of copyrighted works are fair use. The most important example is almost certainly the Google Books case."
#AI#GenerativeAI#SLMs#Microsoft#ChatBots#Phi3: "How did Microsoft cram a capability potentially similar to GPT-3.5, which has at least 175 billion parameters, into such a small model? Its researchers found the answer by using carefully curated, high-quality training data they initially pulled from textbooks. "The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered web data and synthetic data," writes Microsoft. "The model is also further aligned for robustness, safety, and chat format."
Much has been written about the potential environmental impact of AI models and datacenters themselves, including on Ars. With new techniques and research, it's possible that machine learning experts may continue to increase the capability of smaller AI models, replacing the need for larger ones—at least for everyday tasks. That would theoretically not only save money in the long run but also require far less energy in aggregate, dramatically decreasing AI's environmental footprint. AI models like Phi-3 may be a step toward that future if the benchmark results hold up to scrutiny.
Phi-3 is immediately available on Microsoft's cloud service platform Azure, as well as through partnerships with machine learning model platform Hugging Face and Ollama, a framework that allows models to run locally on Macs and PCs."
#ScienceDirect and #Elsevier clearly didn't review this #chemistry paper about #batteries. The intro starts with "Certainly, here is a possible introduction for your topic:".
This lack of oversight erodes #trust in #science. The paper needs to be retracted and the authors sanctioned immediately.
Last was an excellent talk by @jtlg on generative AI and copyright at the Allen Institute for AI. Favorite quote: "It's not at all obvious that the incentives to create of the sort that copyright offers are the appropriate system of law to govern this new [technology]. It may be that what replaces copyright due to generative AI is as different from copyright as copyright was from the patronage system that came before it." Highly recommend https://www.youtube.com/watch?v=toPhm4zBp00 (6/6) #GenerativeAI#copyright