Episode #459 from 3:25:36

DeepSeek training on OpenAI data

Yeah. I mean that's incredibly easy, right? OpenAI publicly stated DeepSeek uses their API and they say they have evidence, right? And this is another element of the training regime, is people at OpenAI have claimed that it's a distilled model, i.e., you're taking OpenAI's model, you're generating a lot of output, and then you're training on the output in their model. And even if that's the case, what they did is still amazing by the way, what DeepSeek did, efficiency-wise. Distillation is standard practice in industry. Whether or not, if you're at a closed lab where you care about terms of service and IP closely, you distill from your own models. If you are a researcher and you're not building any products, you distill from the OpenAI models-

February 3, 2025Unknown24 chaptersDylan Patel

People

Dylan Patel Nathan Lambert

Topics

Artificial Intelligence AGI Programming

Open full episode More from Lex Fridman Podcast Read transcript

Why this moment matters

Starts at 3:25:36

Artificial Intelligence AGI Programming

People and topics

People

Dylan Patel Nathan Lambert

Topics

Artificial Intelligence AGI Programming

All moments