reinforcement learning – Andrew Fairless, Ph.D.

What I Read: Reinforcement Learning

By Andrew Fairless on June 12, 2025March 10, 2025

The Interface Between Reinforcement Learning Theory and Language Model Post-Training The Interface Between Reinforcement Learning Theory and Language Model Post-TrainingAkshay Krishnamurthy, Audrey HuangMarch 5, 2025 “Even though existing RLHF methods…Continue readingWhat I Read: Reinforcement Learning

What I Read: Model, Product

By Andrew Fairless on June 3, 2025March 10, 2025

https://vintagedata.org/blog/posts/model-is-the-product The Model is the ProductAlexander Doria“There were a lot of speculation over the past years about what the next cycle of AI development could be. Agents? Reasoners? Actual multimodality?Continue readingWhat I Read: Model, Product

What I Read: RL, PPO, GRPO

By Andrew Fairless on May 26, 2025February 22, 2025

https://yugeten.github.io/posts/2025/01/ppogrpo A vision researcher’s guide to some RL stuff: PPO & GRPOYuge (Jimmy) ShiJanuary 31, 2025 “This is a deep dive into Proximal Policy Optimization (PPO), which is one ofContinue readingWhat I Read: RL, PPO, GRPO

What I Read: group relative policy optimization

By Andrew Fairless on May 22, 2025February 22, 2025

https://superb-makemake-3a4.notion.site/group-relative-policy-optimization-GRPO-18c41736f0fd806eb39dc35031758885 group relative policy optimization (GRPO)Apoorv NandanJan 31, 2025 “GRPO became popular primarily due to the success of deepseek r1, which used this algorithm to train reasoning capabilities into theirContinue readingWhat I Read: group relative policy optimization

What I Read: Reasoning LLMs

By Andrew Fairless on May 21, 2025February 22, 2025

https://magazine.sebastianraschka.com/p/understanding-reasoning-llms Understanding Reasoning LLMsSebastian Raschka, PhDFeb 05, 2025 “This article describes the four main approaches to building reasoning models, or how we can enhance LLMs with reasoning capabilities.”

What I Read: next weak learners

By Andrew Fairless on May 15, 2025February 2, 2025

https://www.pelayoarbues.com/notes/The-next-generation-of-Weak-Learners The next generation of weak learnersPelayo ArbuésJan 28, 2025 “Large Language Models (LLMs) have often been marketed as our path to Artificial General Intelligence, but I see them asContinue readingWhat I Read: next weak learners

What I Read: Adaptive LLMs

By Andrew Fairless on April 30, 2025January 30, 2025

https://sakana.ai/transformer-squared Transformer²: Self-Adaptive LLMssakana.aiJanuary 15, 2025 “Imagine a machine learning system that could adjust its own weights dynamically to thrive in unfamiliar settings, essentially illustrating a system that evolves asContinue readingWhat I Read: Adaptive LLMs

What I Read: Reward Hacking

By Andrew Fairless on April 1, 2025December 21, 2024

https://lilianweng.github.io/posts/2024-11-28-reward-hacking Reward Hacking in Reinforcement LearningLilian WengNovember 28, 2024 “Reward hacking occurs when a reinforcement learning (RL) agent exploits flaws or ambiguities in the reward function to achieve high rewards,Continue readingWhat I Read: Reward Hacking

What I Read: passively learned, causality

By Andrew Fairless on December 4, 2024August 26, 2024

What can be passively learned about causality?Simons InstituteAndrew Lampinen (Google DeepMind)Jun 25, 2024 “What could language models learn about causality and experimentation from their passive training?”

What I Read: LLM Pre-training Post-training

By Andrew Fairless on November 18, 2024August 26, 2024

https://magazine.sebastianraschka.com/p/new-llm-pre-training-and-post-training New LLM Pre-training and Post-training ParadigmsA Look at How Modern LLMs Are TrainedSebastian Raschka, PhDAug 17, 2024 “Initially, the LLM training process focused solely on pre-training, but it hasContinue readingWhat I Read: LLM Pre-training Post-training

Tag: reinforcement learning