Episode #447 from 2:03:48
RLHF vs RLAIF
What about RL with feedback side RLHF versus RLAIF? What's the role of that in getting better performance on the models? Yeah. So RLHF is when the reward model you use is trained from some labels you've collected from humans giving feedback. I think this works if you have the ability to get a ton of human feedback for this kind of task that you care about.
Why this moment matters
What about RL with feedback side RLHF versus RLAIF? What's the role of that in getting better performance on the models? Yeah. So RLHF is when the reward model you use is trained from some labels you've collected from humans giving feedback. I think this works if you have the ability to get a ton of human feedback for this kind of task that you care about.
Starts at 2:03:48