#AI#GenerativeAI#LLMs#ParetoCurves: "Which is the most accurate AI system for generating code? Surprisingly, there isn’t currently a good way to answer questions like these.
Based on HumanEval, a widely used benchmark for code generation, the most accurate publicly available system is LDB (short for LLM debugger).1 But there’s a catch. The most accurate generative AI systems, including LDB, tend to be agents,2 which repeatedly invoke language models like GPT-4. That means they can be orders of magnitude more costly to run than the models themselves (which are already pretty costly). If we eke out a 2% accuracy improvement for 100x the cost, is that really better?
In this post, we argue that:
AI agent accuracy measurements that don’t control for cost aren’t useful.
Pareto curves can help visualize the accuracy-cost tradeoff.
Current state-of-the-art agent architectures are complex and costly but no more accurate than extremely simple baseline agents that cost 50x less in some cases.
Proxies for cost such as parameter count are misleading if the goal is to identify the best system for a given task. We should directly measure dollar costs instead.
Published agent evaluations are difficult to reproduce because of a lack of standardization and questionable, undocumented evaluation methods in some cases."
"The biggest question raised by a future populated by unexceptional A.I., however, is existential. Should we as a society be investing tens of billions of dollars, our precious electricity that could be used toward moving away from fossil fuels, and a generation of the brightest math and science minds on incremental improvements in mediocre email writing?" (From an NYT article. See original thread.)
Do you REALLY want to get a feel for how GPT-4o does what it does? Just complete this poem — by doing so, you’ll have performed a computation similar to the one it does when you feed it a text-plus-image prompt.
Just FYI, if you have older parents or other family members, set up some sort of shibboleth with them so they know what to ask you if you ever call them asking for something. These new generative models are going to be extremely convincing, and the idiots in charge of these companies think they can use guardrails to stop it being used inappropriately. They can't. #genAI#LLMs#chatgpt
“React and the component model standardises the software developer and reduces their individual bargaining power excluding them from a proportional share in the gains”. An amazing write-up by @baldur about the de-skilling of developers to reduce their ability to fight back against their employers.
“The general problem of mixing data with commands is at the root of many of our computer security vulnerabilities.” Great explainer by security researcher Bruce Schneier on why large language models may not be a great choice for tasks like processing your emails. https://cacm.acm.org/opinion/llms-data-control-path-insecurity/
"When the Singaporean government asked local writers if they would agree to having their work used to train a large language model, it probably did not expect the country’s tiny literary community to react so fiercely."
I just issued a data deletion request to #StackOverflow to erase all of the associations between my name and the questions, answers and comments I have on the platform.
One of the key ways in which #RAG works to supplement #LLMs is based on proven associations. Higher ranked Stack Overflow members' answers will carry more weight in any #LLM that is produced.
By asking for my name to be disassociated from the textual data, it removes a semantic relationship that is helpful for determining which tokens of text to use in an #LLM.
If you sell out your user base without consultation, expect a backlash.