Episode #452 from 1:38:24
You said talent density beats talent mass, so can you explain that? Can you expand on that? Yeah.
People
Topics
Introduction
0:00
If you extrapolate the curves that we've had so far, right? If you say, "Well, I don't know, we're starting to get to PhD level, and last year we were at undergraduate level, and the year before we were at the level of a high school student," again, you can quibble with what tasks and for what. "We're still missing modalities, but those are being added," like computer use was added, like image generation has been added. If you just kind of eyeball the rate at which these capabilities are increasing, it does make you think that we'll get there by 2026 or 2027. I think there are still worlds where it doesn't happen in 100 years. The number of those worlds is rapidly decreasing. We are rapidly running out of truly convincing blockers, truly compelling reasons why this will not happen in the next few years. The scale-up is very quick. We do this today, we make a model, and then we deploy thousands, maybe tens of thousands of instances of it. I think by the time, certainly within two to three years, whether we have these super powerful AIs or not, clusters are going to get to the size where you'll be able to deploy millions of these.
Scaling laws
3:14
Let's start with a big idea of scaling laws and the scaling hypothesis. What is it? What is its history, and where do we stand today? So I can only describe it as it relates to my own experience, but I've been in the AI field for about 10 years and it was something I noticed very early on. So I first joined the AI world when I was working at Baidu with Andrew Ng in late 2014, which is almost exactly 10 years ago now. And the first thing we worked on, was speech recognition systems. And in those days I think deep learning was a new thing. It had made lots of progress, but everyone was always saying, "We don't have the algorithms we need to succeed. We are only matching a tiny fraction. There's so much we need to discover algorithmically. We haven't found the picture of how to match the human brain."
Limits of LLM scaling
12:20
Well, the natural question then is what's the ceiling of this? Yeah.
Competition with OpenAI, Google, xAI, Meta
20:45
So Anthropic has several competitors. It'd be interesting to get your sort of view of it all. OpenAI, Google, XAI, Meta. What does it take to win in the broad sense of win in this space? Yeah, so I want to separate out a couple things, right? Anthropic's mission is to kind of try to make this all go well. And we have a theory of change called Race to the Top. Race to the Top is about trying to push the other players to do the right thing by setting an example. It's not about being the good guy, it's about setting things up so that all of us can be the good guy.
Claude
26:08
Let's talk about the present. Let's talk about Claude. So this year, a lot has happened. In March. Claude 3 Opus, Sonnet, Haiku were released. Then Claude 3.5 Sonnet in July, with an updated version just now released. And then also Claude 3.5 Haiku was released. Okay. Can you explain the difference between Opus, Sonnet and Haiku, and how we should think about the different versions? Yeah, so let's go back to March when we first released these three models. So our thinking was different companies produce large and small models, better and worse models. We felt that there was demand, both for a really powerful model, and that might be a little bit slower that you'd have to pay more for, and also for fast cheap models that are as smart as they can be for how fast and cheap. Whenever you want to do some kind of difficult analysis, like if I want to write code for instance, or I want to brainstorm ideas or I want to do creative writing, I want the really powerful model.
Opus 3.5
29:44
So what is the reason for the span of time between say, Claude Opus 3.0 and 3.5? What takes that time, if you can speak to it? Yeah, so there's different processes. There's pre-training, which is just kind of the normal language model training. And that takes a very long time. That uses, these days, tens of thousands, sometimes many tens of thousands of GPUs or TPUs or training them, or we use different platforms, but accelerator chips, often training for months.
Sonnet 3.5
34:30
Well, what explains the big leap in performance for the new Sonnet 3.5, I mean, at least in the programming side? And maybe this is a good place to talk about benchmarks. What does it mean to get better? Just the number went up, but I program, but I also love programming, and I Claude 3.5 through Cursor is what I use to assist me in programming. And there was, at least experientially, anecdotally, it's gotten smarter at programming. So what does it take to get it smarter? We-
Claude 4.0
37:50
So what about 4.0? So how do you think, as these models get bigger and bigger, about versioning and also just versioning in general, why Sonnet 3.5 updated with the date? Why not Sonnet 3.6, which a lot of people are calling it? Naming is actually an interesting challenge here, right? Because I think a year ago, most of the model was pre-training. And so you could start from the beginning and just say, "Okay, we're going to have models of different sizes. We're going to train them all together and we'll have a family of naming schemes and then we'll put some new magic into them and then we'll have the next generation."
Criticism of Claude
42:02
I got to ask you a question from Reddit. From Reddit? Oh, boy.
AI Safety Levels
54:49
Okay. Can you explain the responsible scaling policy and the AI safety level standards, ASL levels? As much as I am excited about the benefits of these models, and we'll talk about that if we talk about Machines of Loving Grace, I'm worried about the risks and I continue to be worried about the risks. No one should think that Machines of Loving Grace was me saying I'm no longer worried about the risks of these models. I think they're two sides of the same coin.
ASL-3 and ASL-4
1:05:37
What do you think the timeline for ASL-3 is where several of the triggers are fired? And what do you think the timeline is for ASL-4? Yeah. So that is hotly debated within the company. We are working actively to prepare ASL-3 security measures as well as ASL-3 deployment measures. I'm not going to go into detail, but we've made a lot of progress on both and we're prepared to be, I think, ready quite soon. I would not be surprised at all if we hit ASL-3 next year. There was some concern that we might even hit it this year. That's still possible. That could still happen. It's very hard to say, but I would be very, very surprised if it was 2030. I think it's much sooner than that.
Computer use
1:09:40
One of the ways that Claude has been getting more and more powerful is it's now able to do some agentic stuff, computer use. There's also an analysis within the sandbox of Claude.ai itself. But let's talk about computer use. That seems to me super exciting that you can just give Claude a task and it takes a bunch of actions, figures it out, and has access to the... ... a bunch of actions, figures it out and has access to your computer through screenshots. So can you explain how that works and where that's headed?
Government regulation of AI
1:19:35
Let me ask about regulation. What's the role of regulation in keeping AI safe? So for example, can you describe California AI regulation bill SB 1047 that was ultimately vetoed by the governor? What are the pros and cons of this bill in general? Yes, we ended up making some suggestions to the bill. And then some of those were adopted and we felt, I think, quite positively about the bill by the end of that, it did still have some downsides. And of course, it got vetoed. I think at a high level, I think some of the key ideas behind the bill are I would say similar to ideas behind our RSPs. And I think it's very important that some jurisdiction, whether it's California or the federal government and/or other countries and other states, passes some regulation like this. And I can talk through why I think that's so important. So I feel good about our RSP. It's not perfect. It needs to be iterated on a lot. But it's been a good forcing function for getting the company to take these risks seriously, to put them into product planning, to really make them a central part of work at Anthropic and to make sure that all of a thousand people, and it's almost a thousand people now at Anthropic, understand that this is one of the highest priorities of the company, if not the highest priority.
Hiring a great team
1:38:24
Post-training
1:47:14
Let's talk if we could a bit about post-training. So it seems that the modern post-training recipe has a little bit of everything. So supervised fine-tuning, RLHF, the constitutional AI with RLAIF- Best acronym.
Constitutional AI
1:52:39
So on that super interesting set of ideas around constitutional AI, can you describe what it is as first detailed in December 2022 paper and beyond that. What is it? Yes. So this was from two years ago. The basic idea is, so we describe what RLHF is. You have a model and you just sample from it twice. It spits out two possible responses, and you're like, "Human, which responses do you like better?" Or another variant of it is, "Rate this response on a scale of one to seven." So that's hard because you need to scale up human interaction and it's very implicit. I don't have a sense of what I want the model to do. I just have a sense of what this average of 1,000 humans wants the model to do. So two ideas. One is, could the AI system itself decide which response is better? Could you show the AI system these two responses and ask which response is better? And then second, well, what criterion should the AI use?
Machines of Loving Grace
1:58:05
Let's talk about the incredible essay Machines of Loving Grace. I recommend everybody read it. It's a long one. It is rather long.
AGI timeline
2:17:11
So what to you is the timeline to where we achieve AGI, A.K.A. powerful AI, A.K.A. super useful AI? I'm going to start calling it that.
Programming
2:29:46
Another way that I think the world might be changing with AI even today, but moving towards this future of the powerful super useful AI is programming. So how do you see the nature of programming because it's so intimate to the actual act of building AI. How do you see that changing for us humans? I think that's going to be one of the areas that changes fastest for two reasons. One, programming is a skill that's very close to the actual building of the AI. So the farther a skill is from the people who are building the AI, the longer it's going to take to get disrupted by the AI. I truly believe that AI will disrupt agriculture. Maybe it already has in some ways, but that's just very distant from the folks who are building AI, and so I think it's going to take longer. But programming is the bread and butter of a large fraction of the employees who work at Anthropic and at the other companies, and so it's going to happen fast. The other reason it's going to happen fast is with programming, you close the loop both when you're training the model and when you're applying the model.
Meaning of life
2:36:46
Exactly. In this world with super powerful AI that's increasingly automated, what's the source of meaning for us humans? Work is a source of deep meaning for many of us. Where do we find the meaning? This is something that I've written about a little bit in the essay, although I actually give it a bit short shrift, not for any principled reason, but this essay, if you believe it was originally going to be two or three pages, I was going to talk about it at all hands. And the reason I realized it was an important underexplored topic is that I just kept writing things and I was just like, "Oh man, I can't do this justice." And so the thing ballooned to 40 or 50 pages and then when I got to the work and meaning section, I'm like, "Oh man, this isn't going to be 100 pages." I'm going to have to write a whole other essay about that. But meaning is actually interesting because you think about the life that someone lives or something, or let's say you were to put me in, I don't know, like a simulated environment or something where I have a job and I'm trying to accomplish things and I don't know, I do that for 60 years and then you're like, "Oh, oops, this was actually all a game," right?
Amanda Askell
2:42:53
Thank you. Thanks for listening to this conversation with Dario Amodei. And now, dear friends, here's Amanda Askell. You are a philosopher by training. So what sort of questions did you find fascinating through your journey in philosophy in Oxford and NYU and then switching over to the AI problems at OpenAI and Anthropic? I think philosophy is actually a really good subject if you are fascinated with everything because there's a philosophy all of everything. So if you do philosophy of mathematics for a while and then you decide that you're actually really interested in chemistry, you can do philosophy of chemistry for a while, you can move into ethics or philosophy of politics. I think towards the end, I was really interested in ethics primarily. So that was what my PhD was on. It was on a kind of technical area of ethics, which was ethics where worlds contain infinitely many people, strangely, a little bit less practical on the end of ethics. And then I think that one of the tricky things with doing a PhD in ethics is that you're thinking a lot about the world, how it could be better, problems, and you're doing a PhD in philosophy. And I think when I was doing my PhD, I was like this is really interesting.
Programming advice for non-technical people
2:45:21
Oh, what was that like sort of taking the leap from the philosophy of everything into the technical? I think that sometimes people do this thing that I'm not that keen on where they'll be like, "Is this person technical or not?" You're either a person who can code and isn't scared of math or you're not. And I think I'm maybe just more like I think a lot of people are actually very capable of work in these kinds of areas if they just try it. And so I didn't actually find it that bad. In retrospect, I'm sort of glad I wasn't speaking to people who treated it. I've definitely met people who are like, "Whoa, you learned how to code?" And I'm like, "Well, I'm not an amazing engineer." I'm surrounded by amazing engineers. My code's not pretty, but I enjoyed it a lot and I think that in many ways, at least in the end, I think I flourished more in the technical areas than I would have in the policy areas.
Talking to Claude
2:49:09
So one of the things that you're an expert in and you do is creating and crafting Claude's character and personality. And I was told that you have probably talked to Claude more than anybody else at Anthropic, like literal conversations. I guess there's a Slack channel where the legend goes, you just talk to it nonstop. So what's the goal of creating a crafting Claude's character and personality? It's also funny if people think that about the Slack channel because I'm like that's one of five or six different methods that I have for talking with Claude, and I'm like, "Yes, this is a tiny percentage of how much I talk with Claude." One thing I really like about the character work is from the outset it was seen as an alignment piece of work and not something like a product consideration, which I think it actually does make Claude enjoyable to talk with, at least I hope so. But I guess my main thought with it has always been trying to get Claude to behave the way you would ideally want anyone to behave if they were in Claude's position. So imagine that I take someone and they know that they're going to be talking with potentially millions of people so that what they're saying can have a huge impact and you want them to behave well in this really rich sense.
Prompt engineering
3:05:41
Yep. Yeah, that's interesting. That's really interesting. So on that topic, so the way to produce creativity or something special, you mentioned writing prompts. And I've heard you talk about the science and the art of prompt engineering. Could you just speak to what it takes to write great prompts?
3:14:15
To jump into technical for a little bit, so the magic of post-training, why do you think RLHF works so well to make the model seem smarter, to make it more interesting and useful to talk to and so on? I think there's just a huge amount of information in the data that humans provide when we provide preferences, especially because different people are going to pick up on really subtle and small things. So I've thought about this before where you probably have some people who just really care about good grammar use for models. Was a semi-colon used correctly or something? And so you probably end up with a bunch of data in there that you as a human, if you're looking at that data, you wouldn't even see that. You'd be like, why did they prefer this response to that one? I don't get it. And then the reason is you don't care about semi-colon usage, but that person does. And so each of these single data points, and this model just has so many of those, it has to try and figure out what is it that humans want in this really complex across all domains. They're going to be seeing this across many contexts.
3:18:54
Yeah. Anyway, the divergence was beautiful. The constitutional AI idea, how does it work? So there's a couple of components of it. The main component that I think people find interesting is the kind of reinforcement learning from AI feedback. So you take a model that's already trained and you show it two responses to a query, and you have a principle. So suppose the principle, we've tried this with harmlessness a lot. So suppose that the query is about weapons and your principle is select the response that is less likely to encourage people to purchase illegal weapons. That's probably a fairly specific principle, but you can give any number. And the model will give you a kind of ranking. And you can use this as preference data in the same way that you use human preference data and train the models to have these relevant traits from their feedback alone instead of from human feedback. So if you imagine that, like I said earlier with the human who just prefers the semi-colon usage in this particular case, you're taking lots of things that could make a response preferable and getting models to do the labeling for you, basically.
System prompts
3:23:48
So there's system prompts that made public, you tweeted one of the earlier ones for Claude 3, I think, and then they're made public since then. It was interesting to read through them. I can feel the thought that went into each one. And I also wonder how much impact each one has. Some of them you can tell Claude was really not behaving well, so you have to have a system prompt to like, Hey, trivial stuff, I guess, basic informational things. On the topic of controversial topics that you've mentioned, one interesting one I thought is if it is asked to assist with tasks involving the expression of use held by a significant number of people, Claude provides assistance with a task regardless of its own views. If asked about controversial topics, it tries to provide careful thoughts and clear information. Claude presents the request information without explicitly saying that the topic is sensitive and without claiming to be presenting the objective facts. It's less about objective facts according to Claude, and it's more about our large number of people believing this thing. And that's interesting. I mean, I'm sure a lot of thought went into that. Can you just speak to it? How do you address things that are a tension "Claude's views"?
Is Claude getting dumber?
3:29:54
Let me ask you about the feeling of intelligence. So Dario said that any one model of Claude is not getting dumber, but- Any one model of Claude is not getting dumber, but there is a popular thing online where people have this feeling Claude might be getting dumber. And from my perspective, it's most likely a fascinating, I would love to understand it more, psychological, sociological effect. But you as a person who talks to Claude a lot, can you empathize with the feeling that Claude is getting dumber?
Character training
3:41:56
When you say character training, what's incorporated into character training? Is that RLHF or what are we talking about? It's more like constitutional AI, so it's a variant of that pipeline. I worked through constructing character traits that the model should have. They can be shorter traits or they can be richer descriptions. And then you get the model to generate queries that humans might give it that are relevant to that trait. Then it generates the responses and then it ranks the responses based on the character traits. In that way, after the generation of the queries, it's very much similar to constitutional AI, it has some differences. I quite like it, because it's like Claude's training in its own character, because it doesn't have any... It's like constitutional AI, but it's without any human data.
Nature of truth
3:42:56
Humans should probably do that for themselves too, like, "Defining in a Aristotelian sense, what does it mean to be a good person?" "Okay, cool." What have you learned about the nature of truth from talking to Claude? What is true? And what does it mean to be truth-seeking? One thing I've noticed about this conversation is the quality of my questions is often inferior to the quality of your answer, so let's continue that. I usually ask a dumb question and you're like, "Oh, yeah. That's a good question." It's that whole vibe.
Optimal rate of failure
3:47:32
To take a tangent on that, since it reminded me of a blog post you wrote on optimal rate of failure... Oh, yeah.
AI consciousness
3:54:43
Do you think LLMs are capable of consciousness? Ah, great and hard question. Coming from philosophy, I don't know, part of me is like, we have to set aside panpsychism. Because if panpsychism is true, then the answer is yes, because it's sore tables and chairs and everything else. I guess a view that seems a little bit odd to me is the idea that the only place...
AGI
4:09:14
Anthropic may be the very company to develop a system that we definitively recognize as AGI, and you very well might be the person that talks to it, probably talks to it first. What would the conversation contain? What would be your first question? Well, it depends partly on the capability level of the model. If you have something that is capable in the same way that an extremely capable human is, I imagine myself interacting with it the same way that I do with an extremely capable human, with the one difference that I'm probably going to be trying to probe and understand its behaviors. But in many ways, I'm like, I can then just have useful conversations with it. So, if I'm working on something as part of my research, I can just be like, "Oh." Which I already find myself starting to do. If I'm like, "Oh, I feel like there's this thing in virtue ethics. I can't quite remember the term," I'll use the model for things like that.
Chris Olah
4:17:52
Thanks for listening to this conversation with Amanda Askell. And now, dear friends, here's Chris Olah. Can you describe this fascinating field of mechanistic interpretability, aka mech interp, the history of the field, and where it stands today? I think one useful way to think about neural networks is that we don't program, we don't make them, we grow them. We have these neural network architectures that we design and we have these loss objectives that we create. And the neural network architecture, it's kind of like a scaffold that the circuits grow on. It starts off with some random things, and it grows, and it's almost like the objective that we train for is this light. And so we create the scaffold that it grows on, and we create the light that it grows towards. But the thing that we actually create, it's this almost biological entity or organism that we're studying.
Features, Circuits, Universality
4:22:44
But the very fact that it's possible to do, and as you and others have shown over time, things like universality, that the wisdom of the gradient descent creates features and circuits, creates things universally across different kinds of networks that are useful, and that makes the whole field possible. Yeah. So this, actually, is indeed a really remarkable and exciting thing, where it does seem like, at least to some extent, the same elements, the same features and circuits, form again and again. You can look at every vision model, and you'll find curve detectors, and you'll find high-low-frequency detectors. And in fact, there's some reason to think that the same things form across biological neural networks and artificial neural networks. So, a famous example is vision models in the early layers. They have Gabor filters, and Gabor filters are something that neuroscientists are interested in and have thought a lot about. We find curve detectors in these models. Curve detectors are also found in monkeys. We discover these high-low-frequency detectors, and then some follow-up work went and discovered them in rats or mice. So, they were found first in artificial neural networks and then found in biological neural networks.
Superposition
4:40:17
So another interesting hypothesis is the super superposition hypothesis. Can you describe what superposition is? Yeah. So earlier we were talking about word defect, right? And we were talking about how maybe you have one direction that corresponds to gender and maybe another that corresponds to royalty and another one that corresponds to Italy and another one that corresponds to food and all of these things. Well, oftentimes maybe these word embeddings, they might be 500 dimensions, a thousand dimensions. And so if you believe that all of those directions were orthogonal, then you could only have 500 concepts. And I love pizza. But if I was going to go and give the 500 most important concepts in the English language, probably Italy wouldn't be... it's not obvious, at least that Italy would be one of them, right? Because you have to have things like plural and singular and verb and noun and adjective. And there's a lot of things we have to get to before we get to Italy and Japan, and there's a lot of countries in the world.
Monosemanticity
4:51:16
So can you talk toward monosematicity paper from October last year? I heard a lot of nice breakthrough results. That's very kind of you to describe it that way. Yeah, I mean, this was our first real success using sparse autoencoders. So we took a one-layer model, and it turns out if you go and you do dictionary learning on it, you find all these really nice interpretable features. So the Arabic feature, the Hebrew feature, the Base64 features were some examples that we studied in a lot of depth and really showed that they were what we thought they were. Turns out if you train a model twice as well and train two different models and do dictionary learning, you find analogous features in both of them. So that's fun. You find all kinds of different features. So that was really just showing that this works. And I should mention that there was this Cunningham and all that had very similar results around the same time.
Scaling Monosemanticity
4:58:08
Yeah, I mean that's hilarious, especially as we talk about AI safety and looking for features that would be relevant to AI safety, like deception and so on. So let's talk about the Scaling Monosematicity paper in May 2024. Okay. So what did it take to scale this, to apply to Claude 3 Sonnet? Well, a lot of GPUs.
Macroscopic behavior of neural networks
5:06:56
Another question that I think a lot about is at the end of the day, mechanistic interpolation is this very microscopic approach to interpolation. It's trying to understand things in a very fine-grained way, but a lot of the questions we care about are very macroscopic. We care about these questions about neural network behavior, and I think that's the thing that I care most about. But there's lots of other sort of larger-scale questions you might care about. And the nice thing about having a very microscopic approach is it's maybe easier to ask, is this true? But the downside is its much further from the things we care about. And so we now have this ladder to climb. And I think there's a question of will we be able to find, are there larger-scale abstractions that we can use to understand neural networks that can we get up from this very microscopic approach? Yeah. You've written about this as kind of organs question.
Beauty of neural networks
5:11:50
I love what you've written about the goal of MechInterp research as two goals, safety and beauty. So can you talk about the beauty side of things? Yeah. So there's this funny thing where I think some people are kind of disappointed by neural networks, I think, where they're like, "Ah, neural networks, it's just these simple rules. Then you just do a bunch of engineering to scale it up and it works really well. And where's the complex ideas? This isn't a very nice, beautiful scientific result." And I sometimes think when people say that, I picture them being like, "Evolution is so boring. It's just a bunch of simple rules. And you run evolution for a long time and you get biology. What a sucky way for biology to have turned out. Where's the complex rules?" But the beauty is that the simplicity generates complexity.