Episode #459 from 2:55:23

OpenAI o3-mini vs DeepSeek r1

All right. So, we have fun things happening in real time. This is a good opportunity to talk about other reasoning models, o1, o3, just now OpenAI, as perhaps expected, released o3-mini. What are we expecting from the different flavors? Can you just lay out the different flavors of the o models and from Gemini, the reasoning model? Something I would say about these reasoning models is we talked a lot about reasoning training on math and code. And what is done is that you have the base model we've talked about a lot on the internet, you do this large scale reasoning training with reinforcement learning, and then what the DeepSeek paper detailed in this R1 paper, which for me is one of the big open questions on how do you do this, is that they did reasoning heavy, but very standard post-training techniques after the large scale reasoning RL. So they did the same things with a form of instruction tuning through rejection sampling, which is essentially heavily filtered instruction tuning with some reward models. And then they did this RLHF, but they made it math heavy.

February 3, 2025Unknown24 chaptersLex Fridman

People

Dylan Patel Nathan Lambert

Topics

Artificial Intelligence AGI Programming

Open full episode More from Lex Fridman Podcast Read transcript

Why this moment matters

Starts at 2:55:23

Artificial Intelligence AGI Programming

People and topics

People

Dylan Patel Nathan Lambert

Topics

Artificial Intelligence AGI Programming

All moments