Post-training

To jump into technical for a little bit, so the magic of post-training, why do you think RLHF works so well to make the model seem smarter, to make it more interesting and useful to talk to and so on? I think there's just a huge amount of information in the data that humans provide when we provide preferences, especially because different people are going to pick up on really subtle and small things. So I've thought about this before where you probably have some people who just really care about good grammar use for models. Was a semi-colon used correctly or something? And so you probably end up with a bunch of data in there that you as a human, if you're looking at that data, you wouldn't even see that. You'd be like, why did they prefer this response to that one? I don't get it. And then the reason is you don't care about semi-colon usage, but that person does. And so each of these single data points, and this model just has so many of those, it has to try and figure out what is it that humans want in this really complex across all domains. They're going to be seeing this across many contexts.

November 11, 2024Unknown40 chaptersLex FridmanAmanda Askell

People

Dario Amodei Amanda Askell Chris Olah

Topics

Artificial Intelligence AGI Programming

Open full episode More from Lex Fridman Podcast Read transcript

Why this moment matters

Starts at 3:14:15

Artificial Intelligence AGI Programming

People and topics

People

Dario Amodei Amanda Askell Chris Olah

Topics

Artificial Intelligence AGI Programming

All moments