What I Read: group relative policy optimization

https://superb-makemake-3a4.notion.site/group-relative-policy-optimization-GRPO-18c41736f0fd806eb39dc35031758885

group relative policy optimization (GRPO)
Apoorv Nandan
Jan 31, 2025


“GRPO became popular primarily due to the success of deepseek r1, which used this algorithm to train reasoning capabilities into their base language model.”