chatbot – Andrew Fairless, Ph.D.

What I Read: Chatbot Limitations

By Andrew Fairless on May 19, 2025February 22, 2025

Chatbot Software Begins to Face Fundamental Limitations Chatbot Software Begins to Face Fundamental LimitationsAnil AnanthaswamyJanuary 31, 2025 “Recent results show that large language models struggle with compositional tasks, suggesting aContinue readingWhat I Read: Chatbot Limitations

What I Read: LLM judge

By Andrew Fairless on February 25, 2025November 9, 2024

https://hamel.dev/blog/posts/llm-judge Creating a LLM-as-a-Judge That Drives Business ResultsHamel HusainOctober 29, 2024 “Earlier this year, I wrote Your AI product needs evals. Many of you asked, “How do I get startedContinue readingWhat I Read: LLM judge

What I Read: Evaluating LLM-Evaluators

By Andrew Fairless on December 3, 2024August 26, 2024

https://eugeneyan.com/writing/llm-evaluators Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)Eugene Yan “After reading this, you’ll gain an intuition on how to apply, evaluate, and operate LLM-evaluators. We’ll learn when to apply (i)Continue readingWhat I Read: Evaluating LLM-Evaluators

What I Read: Turing Test, intelligence

By Andrew Fairless on November 12, 2024August 25, 2024

https://www.science.org/doi/10.1126/science.adq9356 The Turing Test and our shifting conceptions of intelligenceMelanie MitchellScienceVol 385, Issue 6710DOI: 10.1126/science.adq935615 Aug 2024 “It’s likely that the Turing Test will become yet another casualty of ourContinue readingWhat I Read: Turing Test, intelligence

What I Read: LLM evaluation

By Andrew Fairless on October 21, 2024July 22, 2024

https://hamel.dev/blog/posts/evals Your AI Product Needs EvalsHow to construct domain-specific LLM evaluation systems.Hamel HusainMarch 29, 2024 “…I’ve seen many successful and unsuccessful approaches to building LLM products. I’ve found that unsuccessfulContinue readingWhat I Read: LLM evaluation

What I Read: Data Flywheels, LLM

By Andrew Fairless on October 17, 2024July 20, 2024

https://www.sh-reya.com/blog/ai-engineering-flywheel Data Flywheels for LLM ApplicationsShreya ShankarJul 1, 2024 “This diagram illustrates my (idealized) architecture of an LLM pipeline, from input processing through evaluation and logging. It showcases ideas I’llContinue readingWhat I Read: Data Flywheels, LLM

What I Read: Extrinsic Hallucinations, LLMs

By Andrew Fairless on October 7, 2024July 14, 2024

https://lilianweng.github.io/posts/2024-07-07-hallucination Extrinsic Hallucinations in LLMsLilian WengJuly 7, 2024 “This post focuses on extrinsic hallucination. To avoid hallucination, LLMs need to be (1) factual and (2) acknowledge not knowing the answerContinue readingWhat I Read: Extrinsic Hallucinations, LLMs

What I Read: Detecting hallucinations, LLMs, semantic entropy

By Andrew Fairless on September 23, 2024July 8, 2024

https://oatml.cs.ox.ac.uk/blog/2024/06/19/detecting_hallucinations_2024.html Detecting hallucinations in large language models using semantic entropySebastian Farquhar, Jannik Kossen, Lorenz Kuhn, Yarin Gal19 Jun 2024 “We show how one can use uncertainty to detect confabulations.”

What I Read: Summarization, LLMs

By Andrew Fairless on September 3, 2024June 15, 2024

https://cameronrwolfe.substack.com/p/summarization-and-the-evolution-of Summarization and the Evolution of LLMsCameron R. Wolfe, Ph.D.Jun 03, 2024 “How research on abstractive summarization changed language models forever…”

What I Read: LLM, DSPy Assertions and Suggestions

By Andrew Fairless on August 6, 2024May 25, 2024

https://learnbybuilding.ai/tutorials/guiding-llm-output-with-dspy-assertions-and-suggestions Guiding LLM Output with DSPy Assertions and SuggestionsBill Chambers “Assertions in DSPy allow you to define strict rules and constraints that the LLM’s output must (or maybe that youContinue readingWhat I Read: LLM, DSPy Assertions and Suggestions

Tag: chatbot