reinforcement learning – Page 2 – Andrew Fairless, Ph.D.

What I Read: Hidden Infinity, Preference Learning

By Andrew Fairless on October 10, 2024July 14, 2024

https://www.cs.princeton.edu/~smalladi/blog/2024/07/09/dpo-infinity The Hidden Infinity in Preference LearningSadhika MalladiJuly 09 2024 “I demonstrate from first principles how offline preference learning algorithms (e.g., SimPO) can benefit from length normalization, especially when trainingContinue readingWhat I Read: Hidden Infinity, Preference Learning

What I Read: Extrinsic Hallucinations, LLMs

By Andrew Fairless on October 7, 2024July 14, 2024

https://lilianweng.github.io/posts/2024-07-07-hallucination Extrinsic Hallucinations in LLMsLilian WengJuly 7, 2024 “This post focuses on extrinsic hallucination. To avoid hallucination, LLMs need to be (1) factual and (2) acknowledge not knowing the answerContinue readingWhat I Read: Extrinsic Hallucinations, LLMs

What I Read: LLMs train LLMs

By Andrew Fairless on September 4, 2024June 15, 2024

https://sakana.ai/llm-squared Can LLMs invent better ways to train LLMs?Sakana AIJune 13, 2024 “…LLMs themselves have grown increasingly capable of generating hypotheses and writing code. This raises an intriguing question: canContinue readingWhat I Read: LLMs train LLMs

What I Read: Summarization, LLMs

By Andrew Fairless on September 3, 2024June 15, 2024

https://cameronrwolfe.substack.com/p/summarization-and-the-evolution-of Summarization and the Evolution of LLMsCameron R. Wolfe, Ph.D.Jun 03, 2024 “How research on abstractive summarization changed language models forever…”

What I Read: Instruction Tuning

By Andrew Fairless on February 19, 2024December 19, 2023

https://gaotianyu.xyz/blog/2023/11/30/instruction-tuning/ Teach Llamas to Talk: Recent Progress in Instruction TuningTianyu Gao30 November 2023 “…open-ended instruction tuning… fine-tunes an LLM such that it can follow user instructions…. there have been numerousContinue readingWhat I Read: Instruction Tuning

What I Read: Will Scaling Solve Robotics?

By Andrew Fairless on February 15, 2024December 19, 2023

https://nishanthjkumar.com/Will-Scaling-Solve-Robotics-Perspectives-from-CoRL-2023/ Will Scaling Solve Robotics?: Perspectives From Corl 2023Nishanth J. Kumar “…is training a large neural network on a very large dataset a feasible way to solve robotics?”

What I Read: AI System Beats Chess Puzzles

By Andrew Fairless on December 12, 2023November 20, 2023

https://www.quantamagazine.org/google-deepmind-trains-artificial-brainstorming-in-chess-ai-20231115/ AI System Beats Chess Puzzles With ‘Artificial Brainstorming’Stephen OrnesNovember 15, 2023 “By bringing together disparate approaches, machines can reach a new level of creative problem-solving.”

What I Read: LLM Training, RLHF

By Andrew Fairless on October 30, 2023October 5, 2023

https://magazine.sebastianraschka.com/p/llm-training-rlhf-and-its-alternatives LLM Training: RLHF and Its AlternativesSebastian Raschka, PhDSep 10, 2023 “RLHF is an integral part of the modern LLM training pipeline due to its ability to incorporate human preferencesContinue readingWhat I Read: LLM Training, RLHF

What I Read: LLMs

By Andrew Fairless on September 7, 2023August 1, 2023

https://willthompson.name/what-we-know-about-llms-primer What We Know About LLMs (Primer)Will Thompson (Twitter)July 23, 2023 “…it is worth reflecting on what we concretely know about LLMs at this point in time and how theseContinue readingWhat I Read: LLMs

What I Read: LLM Agents

By Andrew Fairless on August 17, 2023July 9, 2023

https://lilianweng.github.io/posts/2023-06-23-agent/ LLM Powered Autonomous AgentsLilian WengJune 23, 2023 “Building agents with LLM (large language model) as its core controller is a cool concept…. it can be framed as a powerfulContinue readingWhat I Read: LLM Agents

Tag: reinforcement learning