https://lilianweng.github.io/posts/2024-11-28-reward-hacking Reward Hacking in Reinforcement LearningLilian WengNovember 28, 2024 “Reward hacking occurs when a reinforcement learning (RL) agent exploits flaws or ambiguities in the reward function to achieve high rewards,