Episode #459 from 2:44:52
Andrej Karpathy and magic of RL
This might be a good place to mention the eloquent and the insightful tweet of the great and the powerful Andrej Karpathy. I think he had a bunch of thoughts, but one of them, "Last thought. Not sure if this is obvious. You know something profound is coming when you're saying it's not sure if it's obvious. There are two major types of learning in both children and in deep learning. There's one, imitation learning, watch and repeat i.e. pre-training, supervised fine-tuning, and two, trial-and-error learning, reinforcement learning. My favorite simple example is AlphaGo. One, is learning by imitating expert players. Two, is reinforcement learning to win the game. Almost every single shocking result of deep learning and the source of all magic is always two.
Why this moment matters
This might be a good place to mention the eloquent and the insightful tweet of the great and the powerful Andrej Karpathy. I think he had a bunch of thoughts, but one of them, "Last thought. Not sure if this is obvious. You know something profound is coming when you're saying it's not sure if it's obvious. There are two major types of learning in both children and in deep learning. There's one, imitation learning, watch and repeat i.e. pre-training, supervised fine-tuning, and two, trial-and-error learning, reinforcement learning. My favorite simple example is AlphaGo. One, is learning by imitating expert players. Two, is reinforcement learning to win the game. Almost every single shocking result of deep learning and the source of all magic is always two.