Episode #490 from 2:44:06

Long context

We haven't really mentioned it much, but implied in this discussion is context length as well. Is there a lot of innovations that's possible there? I think the colloquially accepted thing is that it's a compute and data problem. Sometimes there are small architecture things, like attention variants. We talked about hybrid attention models, which is essentially if you have what looks like a state space model within your transformer. Those are better suited because you have to spend less compute to model the furthest along token. But those aren't free because they have to be accompanied by a lot of compute or the right data. How many sequences of 100,000 tokens do you have in the world, and where do you get these? It just ends up being pretty expensive to scale them.

February 1, 2026Unknown26 chaptersLex Fridman

People

Nathan Lambert Sebastian Raschka

Topics

Artificial Intelligence AGI Robotics

Open full episode More from Lex Fridman Podcast Read transcript

Why this moment matters

Starts at 2:44:06

Artificial Intelligence AGI Robotics

People and topics

People

Nathan Lambert Sebastian Raschka

Topics

Artificial Intelligence AGI Robotics

All moments