Skip to content

Andrew Fairless, Ph.D.

Data, Science, and Tinkering

Overview
Experience and Education
Publications
SHAP Tutorial
Understanding the Basics of Bayesian Linear Regression
Classifying Medicine
The Peanuts Project

Search for:

Search for:

What I Read: RL, PPO, GRPO

Home/What I Learn/What I Read: RL, PPO, GRPO

By BylineAndrew Fairless on May 26, 2025February 22, 2025

https://yugeten.github.io/posts/2025/01/ppogrpo

A vision researcher’s guide to some RL stuff: PPO & GRPO
Yuge (Jimmy) Shi
January 31, 2025

“This is a deep dive into Proximal Policy Optimization (PPO), which is one of the most popular algorithm used in RLHF for LLMs, as well as Group Relative Policy Optimization (GRPO)…”

Cat Links What I Learn Tag Links large language model loss machine learning monte carlo neural network policy reinforcement learning reward training

Post navigation

What I Read: group relative policy optimizationPrev post

What I Read: BAML, agenticNext post

Categories

Bayesian statistics Machine Learning Statistics What I Learn What I Make

Tags

artificial intelligence attention Bayesian chatbot cloud cognition computer vision database data engineering data science deployment DevOps efficiency embedding generalization generative GPU graph healthcare image interpretability large language model latency linear algebra machine learning medicine MLOps monitoring natural language processing neural network neuroscience optimization pipeline probability Python recurrent regression reinforcement learning scalability software engineering SQL statistics training transformer unit test

Copyright © 2025 Andrew Fairless, Ph.D.. All Rights Reserved. | Simple Persona by Catch Themes

Scroll Up