large language model – Page 3 – Andrew Fairless, Ph.D.

What I Watch: How LLMs store facts

By Andrew Fairless on December 16, 2024September 3, 2024

How might LLMs store facts | Chapter 7, Deep Learning3Blue1BrownAug 31, 2024 “Unpacking the multilayer perceptrons in a transformer, and how they may store facts”

What I Read: Fine-tuning

By Andrew Fairless on December 11, 2024September 3, 2024

https://openpipe.ai/blog/fine-tuning-best-practices-series-introduction-and-chapter-1-training-data Fine-tuning Best Practices Series Introduction and Chapter 1: Training DataReid MayoAug 1, 2024 “We’ll explore how to choose the best data, common methods for collecting it, and common methodsContinue readingWhat I Read: Fine-tuning

What I Read: passively learned, causality

By Andrew Fairless on December 4, 2024August 26, 2024

What can be passively learned about causality?Simons InstituteAndrew Lampinen (Google DeepMind)Jun 25, 2024 “What could language models learn about causality and experimentation from their passive training?”

What I Read: Evaluating LLM-Evaluators

By Andrew Fairless on December 3, 2024August 26, 2024

https://eugeneyan.com/writing/llm-evaluators Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)Eugene Yan “After reading this, you’ll gain an intuition on how to apply, evaluate, and operate LLM-evaluators. We’ll learn when to apply (i)Continue readingWhat I Read: Evaluating LLM-Evaluators

What I Read: Classifying pdfs

By Andrew Fairless on November 21, 2024August 26, 2024

https://snats.xyz/pages/articles/classifying_a_bunch_of_pdfs.html Classifying all of the pdfs on the internetSantiago Pedroza2024-08-18 “How would you classify all the pdfs in the internet? Well, that is what I tried doing this time.”

What I Read: Tool Retrieval, RAG

By Andrew Fairless on November 20, 2024August 26, 2024

https://jxnl.co/writing/2024/08/21/trade-off-tool-selection Optimizing Tool Retrieval in RAG Systems: A Balanced ApproachJason Liu2024/08/21 “When it comes to Retrieval-Augmented Generation (RAG) systems, one of the key challenges is deciding how to select andContinue readingWhat I Read: Tool Retrieval, RAG

What I Read: LLM Pre-training Post-training

By Andrew Fairless on November 18, 2024August 26, 2024

https://magazine.sebastianraschka.com/p/new-llm-pre-training-and-post-training New LLM Pre-training and Post-training ParadigmsA Look at How Modern LLMs Are TrainedSebastian Raschka, PhDAug 17, 2024 “Initially, the LLM training process focused solely on pre-training, but it hasContinue readingWhat I Read: LLM Pre-training Post-training

What I Read: Open-endedness, Agentic AI

By Andrew Fairless on November 13, 2024August 25, 2024

https://press.airstreet.com/p/open-endedness-is-all-well-need Open-endedness is all we’ll need: On “Agentic AI”Air Street Capital, Nathan Benaich, and Alex ChalmersAug 15, 2024 “Agentic systems offer the promise of agents – a software system thatContinue readingWhat I Read: Open-endedness, Agentic AI

What I Read: Visual Guide, Quantization

By Andrew Fairless on October 30, 2024August 3, 2024

https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-quantization A Visual Guide to Quantization, Demystifying the Compression of Large Language ModelsMaarten GrootendorstJul 22, 2024 “…I will introduce the field of quantization in the context of language modeling andContinue readingWhat I Read: Visual Guide, Quantization

What I Read: History, Transformer

By Andrew Fairless on October 28, 2024July 22, 2024

Shaping the Future of AI from the History of TransformerStanford CS25: V4 I Hyung Won Chung of OpenAIStanford Online “I will provide a highly-opinionated view on the early history ofContinue readingWhat I Read: History, Transformer

Tag: large language model