Episode #452 from 1:52:39

Constitutional AI

So on that super interesting set of ideas around constitutional AI, can you describe what it is as first detailed in December 2022 paper and beyond that. What is it? Yes. So this was from two years ago. The basic idea is, so we describe what RLHF is. You have a model and you just sample from it twice. It spits out two possible responses, and you're like, "Human, which responses do you like better?" Or another variant of it is, "Rate this response on a scale of one to seven." So that's hard because you need to scale up human interaction and it's very implicit. I don't have a sense of what I want the model to do. I just have a sense of what this average of 1,000 humans wants the model to do. So two ideas. One is, could the AI system itself decide which response is better? Could you show the AI system these two responses and ask which response is better? And then second, well, what criterion should the AI use?

November 11, 2024Unknown40 chaptersLex Fridman

People

Dario Amodei Amanda Askell Chris Olah

Topics

Artificial Intelligence AGI Programming

Open full episode More from Lex Fridman Podcast Read transcript

Why this moment matters

Starts at 1:52:39

Artificial Intelligence AGI Programming

People and topics

People

Dario Amodei Amanda Askell Chris Olah

Topics

Artificial Intelligence AGI Programming

All moments