https://yugeten.github.io/posts/2025/01/ppogrpo A vision researcher’s guide to some RL stuff: PPO & GRPOYuge (Jimmy) ShiJanuary 31, 2025 “This is a deep dive into Proximal Policy Optimization (PPO), which is one of
What I Read: group relative policy optimization
https://superb-makemake-3a4.notion.site/group-relative-policy-optimization-GRPO-18c41736f0fd806eb39dc35031758885 group relative policy optimization (GRPO)Apoorv NandanJan 31, 2025 “GRPO became popular primarily due to the success of deepseek r1, which used this algorithm to train reasoning capabilities into their
What I Read: Learning to Imitate
https://ai.stanford.edu/blog/learning-to-imitate/ Learning to ImitateDivyansh GargNovember 1, 2022 “A key aspect of human learning is imitation…. How can we enable our artificial agents to similarly acquire such fast learning ability?”