Constitutional AI

Yeah. Anyway, the divergence was beautiful. The constitutional AI idea, how does it work? So there's a couple of components of it. The main component that I think people find interesting is the kind of reinforcement learning from AI feedback. So you take a model that's already trained and you show it two responses to a query, and you have a principle. So suppose the principle, we've tried this with harmlessness a lot. So suppose that the query is about weapons and your principle is select the response that is less likely to encourage people to purchase illegal weapons. That's probably a fairly specific principle, but you can give any number. And the model will give you a kind of ranking. And you can use this as preference data in the same way that you use human preference data and train the models to have these relevant traits from their feedback alone instead of from human feedback. So if you imagine that, like I said earlier with the human who just prefers the semi-colon usage in this particular case, you're taking lots of things that could make a response preferable and getting models to do the labeling for you, basically.

November 11, 2024Unknown40 chaptersLex FridmanDario Amodei

People

Dario Amodei Amanda Askell Chris Olah

Topics

Artificial Intelligence AGI Programming

Open full episode More from Lex Fridman Podcast Read transcript

Why this moment matters

Starts at 3:18:54

Artificial Intelligence AGI Programming

People and topics

People

Dario Amodei Amanda Askell Chris Olah

Topics

Artificial Intelligence AGI Programming

All moments