Episode #459 from 3:33
DeepSeek-R1 and DeepSeek-V3
A lot of people are curious to understand China's DeepSeek AI models, so let's lay it out. Nathan, can you describe what DeepSeek-V3 and DeepSeek-R1 are, how they work, how they're trained? Let's look at the big picture and then we'll zoom in on the details. DeepSeek-V3 is a new mixture of experts, transformer language model from DeepSeek who is based in China. They have some new specifics in the model that we'll get into. Largely this is a open weight model and it's a instruction model like what you would use in ChatGPT. They also released what is called the base model, which is before these techniques of post-training. Most people use instruction models today, and those are what's served in all sorts of applications. This was released on, I believe, December 26th or that week. And then weeks later on January 20th, DeepSeek released DeepSeek-R1, which is a reasoning model, which really accelerated a lot of this discussion.
Why this moment matters
A lot of people are curious to understand China's DeepSeek AI models, so let's lay it out. Nathan, can you describe what DeepSeek-V3 and DeepSeek-R1 are, how they work, how they're trained? Let's look at the big picture and then we'll zoom in on the details. DeepSeek-V3 is a new mixture of experts, transformer language model from DeepSeek who is based in China. They have some new specifics in the model that we'll get into. Largely this is a open weight model and it's a instruction model like what you would use in ChatGPT. They also released what is called the base model, which is before these techniques of post-training. Most people use instruction models today, and those are what's served in all sorts of applications. This was released on, I believe, December 26th or that week. And then weeks later on January 20th, DeepSeek released DeepSeek-R1, which is a reasoning model, which really accelerated a lot of this discussion.