It is #onhere a commonplace that LLMs reflect the interests/biases of their training data.
But we also must recognize that LLMs don't exist in a vacuum.
They are trained on data provided by actors with interests/biases, and
they generate results to other actors who promulgate them according to their own interests/biases,
to audiences who have their own interests/biases.
My concern is not whether LLMs can or cannot give an "intelligent" perspective on reality, but rather, in the process of doing so, whose interests and biases are being promulgated, and whose are being served.
#LargeLanguageModels increasingly trained on content created in part by #AI platforms like #ChatGPT. An endless photocopy of a photocopy. Is the future of this stuff inevitably hobbled by a digital Habsburg jaw?
LLMs such as GPT-4 have proven surprisingly successful for a wide range of tasks. We explore the potential of leveraging LLMs as simulators of biological systems.
This text-based simulation paradigm is well-suited for modeling & understanding complex living systems difficult to describe with physics-based 1st-principle simulations
Researchers look at how AI (large language models or LLMs in particular) could change the nature of social science research.
Igor Grossmann et al. (2023) AI and the transformation of social science research. Science. 380: 1108 DOI: 10.1126/science.adi1778
Careful bias management and data fidelity are key https://www.science.org/doi/10.1126/science.adi1778
While I disagree with u/spez's actions, I understand his perspective. Reddit's most valuable asset is its curated text data for training Large Language Models like ChatGPT. Closing down the API protects that asset. He's likely betting that subreddit moderation will be solved with LLMs so the mods that generated that data are of little concern going forward. There will be only one chance to monetize this data asset.
The reason I disagree with u/spez's actions is because I don't believe this asset belongs to him. I'm sure he's protected from a legal perspective but from a philosophical perspective, when a user writes an idea on an online forum, they don't forfeit ownership of that idea. People who contributed to Reddit even 12 months ago had no idea their thoughts will be monetized and consumed by LLMs. We need laws to protect people's data and to democratize data assets.
I also wish u/spez would just be honest about what he's doing. Telling possibly career ending lies about developers and disregarding the mods that made Reddit is inexcusable. It's clear that Reddit has succeeded despite u/spez's leadership.
How come #transformer models aren't made to go back and change their answer as they work? If you ask a human to write something, they will very rarely just spit out an entire document word for word and be done. Most human work involves revising your own output as you work. If you prompt an #LLM to do this, you will get a better result, so why not build the model to do this from the get go?
The #bullshit regarding #quantum is reaching a fever pitch on #tiktok now. I suspect this is a coordinated campaign. But by whom? Who is paying to spin up quantum bullshit on tiktok of all places? Things are getting really weird. This is not the technology dystopia I planned for.
It makes me imagine an alternate plot for Terminator/War Games/The Forbin Project. What if the machine never actually works, but a cult forms around it who believe it works, and that cults seizes power? What if "skynet" was just a sock puppet for fascist generals and tech CEOs, who are the actual ones using machines to kill us all?
Your Wizard of Oz "man behind the curtain" alternate plot to Terminator is what we're living through right now - what with The #Singularity cult latching onto the #AGI possibilities suggested by #ChatGPT, when in reality these #LargeLanguageModels rely on a gig economy of low paid data labellers to even work at all.
Not the results many were expecting given all the #AIHype
Recall [Two Minute Papers] gushing review of that "Sparks of AGI" paper assessing #GPT4 where at 5:20 https://youtu.be/wHiOKDlA8Ac?t=5m20s
it nailed an IMO question almost instantly
#GenerativeAI#LargeLanguageModels rely a lot on the human to do the reasoning for it, and even then #BingChat (Creative) has problems with following the guidance. Notice I only specified the use of "unwieldy" and never required it to use "beard" or "weird" yet the #LLM got fixated on that instead.
Google says that Bard is “is intended to…not replicate existing content at length” and, that if it does quote at length, it'll cite the page. But I wrote most of this Wikipedia entry, and I immediately recognized that Bard is copying it, word-for-word, at length. Sure, there's a footnote, but this is a straight-up duplicate of a webpage. https://en.wikipedia.org/wiki/Southwest_Mountains
@eyeinhand@waldoj
Trouble is that is not how #LargeLanguageModels work. The token embedding that a deep neural transformer network has being trained upon does not encode the information source from which the token stream was derived https://youtu.be/rURRYI66E54
Weighing in on that study that found #ChatGPT to be more empathetic than human physicians, @rebeccawatson finds the research lacked certain rigor - the authors participating in the "blind" study, and whether the diagnosis was even correct didn't feature very highly in their assessment of quality
I pointed #BingChat at that #ChatGPT vs human physicians empathy study and it still reassured me that #AI was not suitable for professional medical advice.
I then referred it to Mike Hansen's video : https://youtu.be/Gk8LQfAe6f8
where ChatGPT instantly nailed a diagnosis that took him and his team weeks to diagnose, and then Bing pretended to have watched the video (probably just read the transcript) and "hallucinated" things that were not in the video.