Episode #452 from 1:52:39

Constitutional AI

So on that super interesting set of ideas around constitutional AI, can you describe what it is as first detailed in December 2022 paper and beyond that. What is it? Yes. So this was from two years ago. The basic idea is, so we describe what RLHF is. You have a model and you just sample from it twice. It spits out two possible responses, and you're like, "Human, which responses do you like better?" Or another variant of it is, "Rate this response on a scale of one to seven." So that's hard because you need to scale up human interaction and it's very implicit. I don't have a sense of what I want the model to do. I just have a sense of what this average of 1,000 humans wants the model to do. So two ideas. One is, could the AI system itself decide which response is better? Could you show the AI system these two responses and ask which response is better? And then second, well, what criterion should the AI use?

Why this moment matters

So on that super interesting set of ideas around constitutional AI, can you describe what it is as first detailed in December 2022 paper and beyond that. What is it? Yes. So this was from two years ago. The basic idea is, so we describe what RLHF is. You have a model and you just sample from it twice. It spits out two possible responses, and you're like, "Human, which responses do you like better?" Or another variant of it is, "Rate this response on a scale of one to seven." So that's hard because you need to scale up human interaction and it's very implicit. I don't have a sense of what I want the model to do. I just have a sense of what this average of 1,000 humans wants the model to do. So two ideas. One is, could the AI system itself decide which response is better? Could you show the AI system these two responses and ask which response is better? And then second, well, what criterion should the AI use?

Starts at 1:52:39
People and topics
All moments
Constitutional AI chapter timestamp | Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity | EpisodeIndex