https://gist.github.com/yoavg/6bff0fecd65950898eba1bb321cfbd81
Reinforcement Learning for Language Models
Yoav Goldberg, April 2023.
“With the release of the ChatGPT model… there was a lot of discussion of the importance of “RLHF training”, that is, “reinforcement learning from human feedback”. I was puzzled for a while as to why RL… is better… Shouldn’t learning from demonstrations… be sufficient?”