Episode #490 from 1:04:12

How AI is trained: Pre-training, Mid-training, and Post-training

I think this might be a good place to define pre-training, mid-training, and post-training. So, pre-training is the classic training one next token prediction at a time. You have a big corpus of data. Nathan probably also has very interesting insights there because of OLMo 3. A big portion of the paper focuses on the right data mix. So, pre-training is essentially just training across entropy loss, training on next token prediction on a vast corpus of internet data, books, papers and so forth. It has changed a little bit over the years in the sense people used to throw in everything they can. Now, it's not just raw data. It's also synthetic data where people rephrase certain things. So synthetic data doesn't necessarily mean purely AI-made-up data.

Why this moment matters

I think this might be a good place to define pre-training, mid-training, and post-training. So, pre-training is the classic training one next token prediction at a time. You have a big corpus of data. Nathan probably also has very interesting insights there because of OLMo 3. A big portion of the paper focuses on the right data mix. So, pre-training is essentially just training across entropy loss, training on next token prediction on a vast corpus of internet data, books, papers and so forth. It has changed a little bit over the years in the sense people used to throw in everything they can. Now, it's not just raw data. It's also synthetic data where people rephrase certain things. So synthetic data doesn't necessarily mean purely AI-made-up data.

Starts at 1:04:12
People and topics
All moments
How AI is trained: Pre-training, Mid-training, and Post-training chapter timestamp | State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | EpisodeIndex