alignment – Andrew Fairless, Ph.D.

What I Read: Age of Data

By Andrew Fairless on April 14, 2025January 5, 2025

https://amatria.in/blog/ageofdata The end of the “Age of Data”? Enter the age of superhuman data and AIXavier AmatriainDecember 24, 2024 “This post will argue that the age of data is farContinue readingWhat I Read: Age of Data

What I Read: Reward Hacking

By Andrew Fairless on April 1, 2025December 21, 2024

https://lilianweng.github.io/posts/2024-11-28-reward-hacking Reward Hacking in Reinforcement LearningLilian WengNovember 28, 2024 “Reward hacking occurs when a reinforcement learning (RL) agent exploits flaws or ambiguities in the reward function to achieve high rewards,Continue readingWhat I Read: Reward Hacking

What I Read: Debate, AI

By Andrew Fairless on March 3, 2025November 16, 2024

Debate May Help AI Models Converge on Truth Debate May Help AI Models Converge on TruthStephen OrnesNovember 8, 2024 “Letting AI systems argue with each other may help expose whenContinue readingWhat I Read: Debate, AI

What I Read: LLM Pre-training Post-training

By Andrew Fairless on November 18, 2024August 26, 2024

https://magazine.sebastianraschka.com/p/new-llm-pre-training-and-post-training New LLM Pre-training and Post-training ParadigmsA Look at How Modern LLMs Are TrainedSebastian Raschka, PhDAug 17, 2024 “Initially, the LLM training process focused solely on pre-training, but it hasContinue readingWhat I Read: LLM Pre-training Post-training

What I Read: evaluating AI systems

By Andrew Fairless on December 5, 2023October 12, 2023

https://www.anthropic.com/index/evaluating-ai-systems Challenges in evaluating AI systemsOct 4, 2023 “…what many people working inside and outside of AI don’t fully appreciate is how difficult it is to build robust and reliableContinue readingWhat I Read: evaluating AI systems

Tag: alignment