Episode #490 from 1:37:18

Post-training explained: Exciting new research directions in LLMs

Yeah, there is a sense that we, together as a civilization, each individually have to find that Goldilocks zone. And in the programming context as developers. Now, we've had this fascinating conversation that started with pre-training and mid-training. Let's get to post-training. There's a lot of fun stuff in post-training. So, what are some of the interesting ideas in post-training? The biggest one from 2025 is learning this reinforcement learning with verifiable rewards, RLVR. You can scale up the training there, which means doing a lot of this kind of iterative generate-grade loop, and that lets the models learn both interesting behaviors on the tool use and software side. This could be searching, running commands on their own and seeing the outputs, and then also that training enables this inference-time scaling very nicely. It just turned out that this paradigm was very nicely linked, where this kind of RL training enables inference-time scaling. But inference-time scaling could have been found in different ways. So, it was kind of this perfect storm where the models change a lot, and the way that they're trained is a major factor in doing so.

February 1, 2026Unknown26 chaptersLex Fridman

People

Nathan Lambert Sebastian Raschka

Topics

Artificial Intelligence AGI Robotics

Open full episode More from Lex Fridman Podcast Read transcript

Why this moment matters

Starts at 1:37:18

Artificial Intelligence AGI Robotics

People and topics

People

Nathan Lambert Sebastian Raschka

Topics

Artificial Intelligence AGI Robotics

All moments