i’m very excited about the interpretability work that #anthropic has been doing with #LLMs.
in this paper, they used classical machine learning algorithms to discover concepts. if a concept like “golden gate bridge” is present in the text, then they discover the associated pattern of neuron activations.
this means that you can monitor LLM responses for concepts and behaviors, like “illicit behavior” or “fart jokes”
this is great work. i’m excited to see where this goes next
i hope #anthropic exposes this via their API. at this point in time, most of the promising interpretability work is only available on open source models that you can run yourself. it would be great to also have them available from #AI vendors
Google to invest up to $2B in Anthropic - and… the race is on between, on one side, Microsoft and OpenAI; and on the other side, Google and Anthropic. My $$ is on MS & OpenAI at the moment - and I don’t expect that to change. OpenAI is the clear leader in AI, with a considerable head start and a top-shelf team. Anthropic will have a lot of catching up to do unless they’ve got some kind of killer, breakthrough tech they’re hiding until launch. #AI#Microsoft#Google#OpenAI#Anthropichttps://www.reuters.com/technology/google-agrees-invest-up-2-bln-openai-rival-anthropic-wsj-2023-10-27/
My first troublesome hallucination with a #LLM in a while: #Claude3#Opus (200k context) insisting that I can configure my existing #Yubikey#GPG keys to work with PKINIT with #Kerberos and helping me for a couple of hours to try to do so — before realising that GPG keys aren't supported for this use case. Whoops.
No real bother other than some wasted time, but a bit painful and disappointing.
#Anthropic is killing it with their AI game, especially for a small startup. Their models are way better than #OpenAI's, but they're focusing more on enterprise stuff rather than hyping it up. This might be a risky move since they don't have a cult following like other AI companies. Still, gotta give them props for their impressive tech. It'll be interesting to see how they balance enterprise with getting more attention from the AI community.
Big news in the #AI world: Current and former employees of #OpenAI and other AI companies like #DeepMind and #Anthropic warn of ethical and safety risks and want a way to publicly whistleblow about these without fear of retaliation.
Back in 2022, Anthropic CEO Dario Amodei chose not to release the super-powerful AI chatbot, Claude, that his company had just finished training, opting instead to focus on further internal safety testing. That move likely cost the company billions — three months later, OpenAI launched ChatGPT.
Having a reputation for credibility and caution in an industry that appears to have thrown a large chunk of it to the wind is not a bad thing though. Claude is now in its third iteration, but that caution remains, with the company pledging not to release AIs above certain capability levels until it can develop sufficiently robust safety measures.
TIME’s interview with Amodei gives an insight into what the AI industry might look like when safety is considered a core part of the strategy.
“Today we report a significant advance in understanding the inner workings of AI models. We have identified how millions of concepts are represented inside Claude Sonnet, one of our deployed large language models. This is the first ever detailed look inside a modern, production-grade large language model. This interpretability discovery could, in future, help us make AI models safer.”