V-JEPA

So that's the I-JEPA. It doesn't need to know that it's an image for example, because the only thing it needs to know is how to do this masking. Whereas with DINO, you need to know it's an image because you need to do things like geometry transformation and blurring and things like that, that are really image specific. A more recent version of this that we have is called V-JEPA. So it's basically the same idea as I-JEPA except it's applied to video. So now you take a whole video and you mask a whole chunk of it. And what we mask is actually kind of a temporal tube, so a whole segment of each frame in the video over the entire video. And that tube was statically positioned throughout the frames, just literally it's a straight tube.

March 7, 2024Unknown23 chaptersYann LeCun

People

Yann LeCun

Topics

Artificial Intelligence AGI Robotics

Open full episode More from Lex Fridman Podcast Read transcript

Why this moment matters

Starts at 38:51

Artificial Intelligence AGI Robotics

People and topics

People

Yann LeCun

Topics

Artificial Intelligence AGI Robotics

All moments