We call it AI because no one would take us seriously if we called it matrix multiplication seeded with a bunch of initial values we pulled out of our asses and run on as much shitty data as we can get our grubby little paws on.
I am really, really, REALLY irritated by what I just saw. The #ImageDescription function of Microsoft's #Bing is outright lying to people with vision impairments about what appears in images it receives. It's bad enough when an #LLM is allowed to tell lies that a person can easily check for veracity themselves. But how the hell are you going to offer this so-called service to someone who can't check the claims being made and NEEDS those claims to be correct?
How long till someone gets poisoned because Bing lied and told someone it was food that hasn't expired when it has, or that it's safe to drink when it's cleaning solution, or God knows what? This is downright irresponsible and dangerous. #Microsoft either needs to put VERY CLEAR disclaimers on their service, or just take it down until it can actually be trusted.
I've been doing a bit more experimenting with #LargeLanguageModels#LLM and truth, and I've got an interesting one.
my experimental design was that I'd start asking about relationships between European monarchs, and then start introducing fictitious monarchs, but I didn't get that far...
Fake Intelligence is where we try to simulate intelligence by feeding huge amounts of dubious information to algorithms we don’t fully understand to create approximations of human behaviour where the safeguards that moderate the real thing provided by family, community, culture, personal responsibility, reputation, and ethics are replaced by norms that satisfy the profit motive of corporate entities.
An idea I've been kicking around the last couple weeks is the need for some kind of tag to use on digital content (writing, artwork, social media profiles, etc.) to specifically prohibit use as training data by Large Language Models. The robots.txt convention is along the lines of what I'm thinking.
Yes, this would be voluntary and self-policed. Yes, I realize that many people who build LLMs will disregard the tags. It may not have any impact initially.
Establishing Trust in ChatGPT Biomedical Generated Text
Ontology-Based Knowledge Graph to Validate Disease-Symptom Links https://arxiv.org/abs/2308.03929
goal: distinguish factual information f. unverified data
1 dataset f. PubMed; vs. ChatGPT simulated articles (AI-generated content)
striking number of links among terms in ChatGPT KG, surpassing some of those in PubMed KG
i think something people don't understand about #ai models like #largelanguagemodels is that they're fixed. they're deterministic. the same input results in the same output. in the whole #copyright discorse recently, people talk like the ai has some agency; that you're just "telling it what to do". only in the same way you tell photoshop what to do. it's just the type of input is different. they're complex. they're not magic. there's no ghost in the machine (yet).
More than 170,000 titles, the majority of them published within the last two decades, were fed into models run by companies including Meta and Bloomberg, according to an analysis of "Books3" - the dataset harnessed by the firms to build their AI tools.
Should copyrighted work be used by #opensource platforms to train #AI models?
On one hand, we have new papers that show how just using the language of a specific human group can trigger implicit, hidden biases in #LargeLanguageModels
on the other hand, we have software developers working to build tools that automatically retrieve information that may be of interest, and that try to reason ahead on your interests. Highest point so far: https://new.computer/
Do you REALLY want to get a feel for how GPT-4o does what it does? Just complete this poem — by doing so, you’ll have performed a computation similar to the one it does when you feed it a text-plus-image prompt.
> v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof).
Anyone who tells you that #LargeLanguageModels like #ChatGPT can think or reason or are stepping stones to true #ArtificialIntelligence is either trying to sell you something or trying to recover the sunk cost of buying it from others.
"Very broadly speaking: the Effective Altruists are doomers, who believe that #LargeLanguageModels (AKA "spicy autocomplete") will someday become so advanced that it could wake up and annihilate or enslave the human race."
Quote @pluralistic
"Spicy autocomplete" is sooooo the best way to best describe what "AI" actually is. Nothing is intelligent, just artificial.
It also depends on what you want the #LargeLanguageModels and/or #generativeAI to do, and if you care to put in the time, effort, and investment to curate the training data or not.
Many of these operations don’t want to do the work in terms of curating their training data (whether that means screening it or asking for permission or whatever) because it’s not cheap or fast!
Does anyone have a good list of logical questions to judge large language models ability to reason?
Questions like "if it takes 3 hours for 3 towels to dry, how long does it take for 9 towels to dry?"
I'm playing around with Mistrals leaked 70b Miqu LLM and want to test it's reasoning skills for a project I'm working on. I've been really impressed so far. It's slower than Mistral & Mixtral but it's been producing the best reasoned answers I've seen from an LLM. And it's running locally!
If your educational institution is still using #Zoom, especially in light of their policy change to use/sell your content to train #LargeLanguageModels (#LLMs), it's doing the wrong thing. Digitally literate institutions (a rare & precious thing) already use #BigBlueButton (#BBB) which is #LibreSoftware & substantially better for educational applications. If you want to trial it, talk to us - we've been making our instances available for institutions to use since Covid: https://oer4covid.oeru.org