reward – Andrew Fairless, Ph.D.

What I Read: RL, PPO, GRPO

By Andrew Fairless on May 26, 2025February 22, 2025

https://yugeten.github.io/posts/2025/01/ppogrpo A vision researcher’s guide to some RL stuff: PPO & GRPOYuge (Jimmy) ShiJanuary 31, 2025 “This is a deep dive into Proximal Policy Optimization (PPO), which is one ofContinue readingWhat I Read: RL, PPO, GRPO

What I Read: group relative policy optimization

By Andrew Fairless on May 22, 2025February 22, 2025

https://superb-makemake-3a4.notion.site/group-relative-policy-optimization-GRPO-18c41736f0fd806eb39dc35031758885 group relative policy optimization (GRPO)Apoorv NandanJan 31, 2025 “GRPO became popular primarily due to the success of deepseek r1, which used this algorithm to train reasoning capabilities into theirContinue readingWhat I Read: group relative policy optimization

What I Read: Reward Hacking

By Andrew Fairless on April 1, 2025December 21, 2024

https://lilianweng.github.io/posts/2024-11-28-reward-hacking Reward Hacking in Reinforcement LearningLilian WengNovember 28, 2024 “Reward hacking occurs when a reinforcement learning (RL) agent exploits flaws or ambiguities in the reward function to achieve high rewards,Continue readingWhat I Read: Reward Hacking

What I Read: Hidden Infinity, Preference Learning

By Andrew Fairless on October 10, 2024July 14, 2024

https://www.cs.princeton.edu/~smalladi/blog/2024/07/09/dpo-infinity The Hidden Infinity in Preference LearningSadhika MalladiJuly 09 2024 “I demonstrate from first principles how offline preference learning algorithms (e.g., SimPO) can benefit from length normalization, especially when trainingContinue readingWhat I Read: Hidden Infinity, Preference Learning

What I Read: LLM Training, RLHF

By Andrew Fairless on October 30, 2023October 5, 2023

https://magazine.sebastianraschka.com/p/llm-training-rlhf-and-its-alternatives LLM Training: RLHF and Its AlternativesSebastian Raschka, PhDSep 10, 2023 “RLHF is an integral part of the modern LLM training pipeline due to its ability to incorporate human preferencesContinue readingWhat I Read: LLM Training, RLHF

What I Read: LLMs

By Andrew Fairless on September 7, 2023August 1, 2023

https://willthompson.name/what-we-know-about-llms-primer What We Know About LLMs (Primer)Will Thompson (Twitter)July 23, 2023 “…it is worth reflecting on what we concretely know about LLMs at this point in time and how theseContinue readingWhat I Read: LLMs

Tag: reward