https://www.cs.princeton.edu/~smalladi/blog/2024/07/09/dpo-infinity The Hidden Infinity in Preference LearningSadhika MalladiJuly 09 2024 “I demonstrate from first principles how offline preference learning algorithms (e.g., SimPO) can benefit from length normalization, especially when training