group relative policy optimization (GRPO)
Apoorv Nandan
Jan 31, 2025
“GRPO became popular primarily due to the success of deepseek r1, which used this algorithm to train reasoning capabilities into their base language model.”
Data, Science, and Tinkering
group relative policy optimization (GRPO)
Apoorv Nandan
Jan 31, 2025
“GRPO became popular primarily due to the success of deepseek r1, which used this algorithm to train reasoning capabilities into their base language model.”