What I Read: Reinforcement Learning – Andrew Fairless, Ph.D.

The Interface Between Reinforcement Learning Theory and Language Model Post-Training

The Interface Between Reinforcement Learning Theory and Language Model Post-Training
Akshay Krishnamurthy, Audrey Huang
March 5, 2025

“Even though existing RLHF methods… employ KL-regularization to prevent deviating from the data collection policy \pi_{\mathrm{ref}}, the fact that these methods overfit suggests that they are not adequately regularized….”

The Interface Between Reinforcement Learning Theory and Language Model Post-Training