The Interface Between Reinforcement Learning Theory and Language Model Post-Training The Interface Between Reinforcement Learning Theory and Language Model Post-TrainingAkshay Krishnamurthy, Audrey HuangMarch 5, 2025 “Even though existing RLHF methods…
What I Read: group relative policy optimization
https://superb-makemake-3a4.notion.site/group-relative-policy-optimization-GRPO-18c41736f0fd806eb39dc35031758885 group relative policy optimization (GRPO)Apoorv NandanJan 31, 2025 “GRPO became popular primarily due to the success of deepseek r1, which used this algorithm to train reasoning capabilities into their
What I Read: Reasoning LLMs
https://magazine.sebastianraschka.com/p/understanding-reasoning-llms Understanding Reasoning LLMsSebastian Raschka, PhDFeb 05, 2025 “This article describes the four main approaches to building reasoning models, or how we can enhance LLMs with reasoning capabilities.”
What I Read: passively learned, causality
What can be passively learned about causality?Simons InstituteAndrew Lampinen (Google DeepMind)Jun 25, 2024 “What could language models learn about causality and experimentation from their passive training?”