From podcastLex Fridman Podcast From guestMichael Truell

Episode #447 from 2:03:48

RLHF vs RLAIF

What about RL with feedback side RLHF versus RLAIF? What's the role of that in getting better performance on the models? Yeah. So RLHF is when the reward model you use is trained from some labels you've collected from humans giving feedback. I think this works if you have the ability to get a ton of human feedback for this kind of task that you care about.

October 6, 2024Unknown22 chaptersLexMichael Truell

People

Cursor Team Aman Sanger Arvid Lunnemark Michael Truell Sualeh Asif

Topics

Artificial Intelligence Programming Law & Justice

Open full episode More from Lex Fridman Podcast Read transcript

Why this moment matters

What about RL with feedback side RLHF versus RLAIF? What's the role of that in getting better performance on the models? Yeah. So RLHF is when the reward model you use is trained from some labels you've collected from humans giving feedback. I think this works if you have the ability to get a ton of human feedback for this kind of task that you care about.

Starts at 2:03:48

Artificial Intelligence Programming Law & Justice

People and topics

People

Cursor Team Aman Sanger Arvid Lunnemark Michael Truell Sualeh Asif

Topics

Artificial Intelligence Programming Law & Justice

All moments

RLHF vs RLAIF chapter timestamp | Cursor Team: Future of Programming with AI | EpisodeIndex