https://huyenchip.com//2023/05/02/rlhf.html
RLHF: Reinforcement Learning from Human Feedback
Chip Huyen
May 2, 2023
“…making models like ChatGPT work. One such cool idea is RLHF (Reinforcement Learning from Human Feedback)…. So, how exactly does RLHF work?”