What I Read: Hidden Infinity, Preference Learning

https://www.cs.princeton.edu/~smalladi/blog/2024/07/09/dpo-infinity

The Hidden Infinity in Preference Learning
Sadhika Malladi
July 09 2024


“I demonstrate from first principles how offline preference learning algorithms (e.g., SimPO) can benefit from length normalization, especially when training on model-annotated preference data…”