GPT vs Claude

Well, let me ask the ridiculous question of which LLM is better at coding? GPT, Claude, who wins in the context of programming? And I'm sure the answer is much more nuanced because it sounds like every single part of this involves a different model. I think there's no model that Pareto dominates others, meaning it is better in all categories that we think matter, the categories being speed, ability to edit code, ability to process lots of code, long context, a couple of other things and coding capabilities. The one that I'd say right now is just net best is Sonnet. I think this is a consensus opinion. o1's really interesting and it's really good at reasoning. So if you give it really hard programming interview style problems or lead code problems, it can do quite well on them, but it doesn't feel like it understands your rough intent as well as Sonnet does. If you look at a lot of the other frontier models, one qualm I have is it feels like they're not necessarily over... I'm not saying they train on benchmarks, but they perform really well in benchmarks relative to everything that's in the middle. So if you tried on all these benchmarks and things that are in the distribution of the benchmarks they're evaluated on, they'll do really well. But when you push them a little bit outside of that, Sonnet is I think the one that does best at maintaining that same capability. You have the same capability in the benchmark as when you try to instruct it to do anything with coding.

October 6, 2024Unknown22 chaptersLexMichael Truell

People

Cursor Team Aman Sanger Arvid Lunnemark Michael Truell Sualeh Asif

Topics

Artificial Intelligence Programming Law & Justice

Open full episode More from Lex Fridman Podcast Read transcript

Why this moment matters

Starts at 36:54

Artificial Intelligence Programming Law & Justice

People and topics

People

Cursor Team Aman Sanger Arvid Lunnemark Michael Truell Sualeh Asif

Topics

Artificial Intelligence Programming Law & Justice

All moments