What I Read: Reinforcement Learning, Language Models

https://gist.github.com/yoavg/6bff0fecd65950898eba1bb321cfbd81

Reinforcement Learning for Language Models
Yoav Goldberg, April 2023.

“With the release of the ChatGPT model… there was a lot of discussion of the importance of “RLHF training”, that is, “reinforcement learning from human feedback”. I was puzzled for a while as to why RL… is better… Shouldn’t learning from demonstrations… be sufficient?”