What I Read: Reinforcement Learning

The Interface Between Reinforcement Learning Theory and Language Model Post-Training


The Interface Between Reinforcement Learning Theory and Language Model Post-Training
Akshay Krishnamurthy, Audrey Huang
March 5, 2025


“Even though existing RLHF methods… employ KL-regularization to prevent deviating from the data collection policy \pi_{\mathrm{ref}}, the fact that these methods overfit suggests that they are not adequately regularized….”