Episode #452 from 4:17:52
Chris Olah
Thanks for listening to this conversation with Amanda Askell. And now, dear friends, here's Chris Olah. Can you describe this fascinating field of mechanistic interpretability, aka mech interp, the history of the field, and where it stands today? I think one useful way to think about neural networks is that we don't program, we don't make them, we grow them. We have these neural network architectures that we design and we have these loss objectives that we create. And the neural network architecture, it's kind of like a scaffold that the circuits grow on. It starts off with some random things, and it grows, and it's almost like the objective that we train for is this light. And so we create the scaffold that it grows on, and we create the light that it grows towards. But the thing that we actually create, it's this almost biological entity or organism that we're studying.
Why this moment matters
Thanks for listening to this conversation with Amanda Askell. And now, dear friends, here's Chris Olah. Can you describe this fascinating field of mechanistic interpretability, aka mech interp, the history of the field, and where it stands today? I think one useful way to think about neural networks is that we don't program, we don't make them, we grow them. We have these neural network architectures that we design and we have these loss objectives that we create. And the neural network architecture, it's kind of like a scaffold that the circuits grow on. It starts off with some random things, and it grows, and it's almost like the objective that we train for is this light. And so we create the scaffold that it grows on, and we create the light that it grows towards. But the thing that we actually create, it's this almost biological entity or organism that we're studying.