https://yugeten.github.io/posts/2025/01/ppogrpo
A vision researcher’s guide to some RL stuff: PPO & GRPO
Yuge (Jimmy) Shi
January 31, 2025
“This is a deep dive into Proximal Policy Optimization (PPO), which is one of the most popular algorithm used in RLHF for LLMs, as well as Group Relative Policy Optimization (GRPO)…”