What I Read: Reinforcement Learning, Language Models


Reinforcement Learning for Language Models
Yoav Goldberg, April 2023.

“With the release of the ChatGPT model… there was a lot of discussion of the importance of “RLHF training”, that is, “reinforcement learning from human feedback”. I was puzzled for a while as to why RL… is better… Shouldn’t learning from demonstrations… be sufficient?”