https://yugeten.github.io/posts/2025/01/ppogrpo A vision researcher’s guide to some RL stuff: PPO & GRPOYuge (Jimmy) ShiJanuary 31, 2025 “This is a deep dive into Proximal Policy Optimization (PPO), which is one of
What I Read: group relative policy optimization
https://superb-makemake-3a4.notion.site/group-relative-policy-optimization-GRPO-18c41736f0fd806eb39dc35031758885 group relative policy optimization (GRPO)Apoorv NandanJan 31, 2025 “GRPO became popular primarily due to the success of deepseek r1, which used this algorithm to train reasoning capabilities into their
What I Read: How Machines ‘Grok’ Data
https://www.quantamagazine.org/how-do-machines-grok-data-20240412 How Do Machines ‘Grok’ Data?Anil Ananthaswamy4/12/24 “By apparently overtraining them, researchers have seen neural networks discover novel solutions to problems.”