Episode #452 from 3:14:15

Post-training

To jump into technical for a little bit, so the magic of post-training, why do you think RLHF works so well to make the model seem smarter, to make it more interesting and useful to talk to and so on? I think there's just a huge amount of information in the data that humans provide when we provide preferences, especially because different people are going to pick up on really subtle and small things. So I've thought about this before where you probably have some people who just really care about good grammar use for models. Was a semi-colon used correctly or something? And so you probably end up with a bunch of data in there that you as a human, if you're looking at that data, you wouldn't even see that. You'd be like, why did they prefer this response to that one? I don't get it. And then the reason is you don't care about semi-colon usage, but that person does. And so each of these single data points, and this model just has so many of those, it has to try and figure out what is it that humans want in this really complex across all domains. They're going to be seeing this across many contexts.

Why this moment matters

To jump into technical for a little bit, so the magic of post-training, why do you think RLHF works so well to make the model seem smarter, to make it more interesting and useful to talk to and so on? I think there's just a huge amount of information in the data that humans provide when we provide preferences, especially because different people are going to pick up on really subtle and small things. So I've thought about this before where you probably have some people who just really care about good grammar use for models. Was a semi-colon used correctly or something? And so you probably end up with a bunch of data in there that you as a human, if you're looking at that data, you wouldn't even see that. You'd be like, why did they prefer this response to that one? I don't get it. And then the reason is you don't care about semi-colon usage, but that person does. And so each of these single data points, and this model just has so many of those, it has to try and figure out what is it that humans want in this really complex across all domains. They're going to be seeing this across many contexts.

Starts at 3:14:15
People and topics
All moments
Post-training chapter timestamp | Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity | EpisodeIndex