policy – Andrew Fairless, Ph.D.

What I Read: RL, PPO, GRPO

By Andrew Fairless on May 26, 2025February 22, 2025

https://yugeten.github.io/posts/2025/01/ppogrpo A vision researcher’s guide to some RL stuff: PPO & GRPOYuge (Jimmy) ShiJanuary 31, 2025 “This is a deep dive into Proximal Policy Optimization (PPO), which is one ofContinue readingWhat I Read: RL, PPO, GRPO

What I Read: group relative policy optimization

By Andrew Fairless on May 22, 2025February 22, 2025

https://superb-makemake-3a4.notion.site/group-relative-policy-optimization-GRPO-18c41736f0fd806eb39dc35031758885 group relative policy optimization (GRPO)Apoorv NandanJan 31, 2025 “GRPO became popular primarily due to the success of deepseek r1, which used this algorithm to train reasoning capabilities into theirContinue readingWhat I Read: group relative policy optimization

What I Read: Learning to Imitate

By Andrew Fairless on January 17, 2023December 4, 2022

https://ai.stanford.edu/blog/learning-to-imitate/ Learning to ImitateDivyansh GargNovember 1, 2022 “A key aspect of human learning is imitation…. How can we enable our artificial agents to similarly acquire such fast learning ability?”

Tag: policy