interpretability – Andrew Fairless, Ph.D.

What I Read: Shapley Interactions

By Andrew Fairless on April 8, 2025December 30, 2024

https://mindfulmodeler.substack.com/p/what-are-shapley-interactions-and What Are Shapley Interactions, and Why Should You Care?Christoph MolnarDec 03, 2024 “Shapley values are the go-to method for explainable AI because they are easy to interpret and theoreticallyContinue readingWhat I Read: Shapley Interactions

What I Read: Autoencoders, Interpretability

By Andrew Fairless on March 25, 2025December 1, 2024

https://adamkarvonen.github.io/machine_learning/2024/06/11/sae-intuitions.html An Intuitive Explanation of Sparse Autoencoders for LLM InterpretabilityAdam KarvonenJun 11, 2024 “Sparse Autoencoders (SAEs) have recently become popular for interpretability of machine learning models…”

What I Read: degree certainty

By Andrew Fairless on February 24, 2025November 9, 2024

https://blog.alexalemi.com/a-degree-of-certainty.html A Degree of CertaintyAlexander A. Alemi2024-08-14 “None of the common ways to measure proabilities are statistically uniform. What do I mean by this? Not all 1% changes in probabilityContinue readingWhat I Read: degree certainty

What I Read: Neural Networks, Understandable

By Andrew Fairless on January 8, 2025September 29, 2024

Novel Architecture Makes Neural Networks More Understandable Novel Architecture Makes Neural Networks More UnderstandableSteve NadisSeptember 11, 2024 “By tapping into a decades-old mathematical principle, researchers are hoping that Kolmogorov-Arnold networksContinue readingWhat I Read: Neural Networks, Understandable

What I Read: Toy Models of Superposition

By Andrew Fairless on December 19, 2024September 29, 2024

https://transformer-circuits.pub/2022/toy_model/index.html Toy Models of SuperpositionNelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, Roger Grosse, Sam McCandlish, Jared Kaplan,Continue readingWhat I Read: Toy Models of Superposition

What I Read: Is SHAP doomed?

By Andrew Fairless on December 18, 2024September 3, 2024

https://mindfulmodeler.substack.com/p/shedding-light-on-impossibility-theorems Shedding light on “Impossibility Theorems for Feature Attribution”: Is SHAP doomed?Christoph MolnarJun 18, 2024 “tl;dr: Don’t use SHAP for counterfactual questions or any questions about “slightly changing feature valuesContinue readingWhat I Read: Is SHAP doomed?

What I Read: How Machines ‘Grok’ Data

By Andrew Fairless on June 25, 2024April 23, 2024

https://www.quantamagazine.org/how-do-machines-grok-data-20240412 How Do Machines ‘Grok’ Data?Anil Ananthaswamy4/12/24 “By apparently overtraining them, researchers have seen neural networks discover novel solutions to problems.”

What I Read: Generalized Additive Models

By Andrew Fairless on June 13, 2024April 15, 2024

https://ecogambler.netlify.app/blog/interpreting-gams How to interpret and report nonlinear effects from Generalized Additive ModelsNicholas ClarkMarch 21, 2024 “Generalized Additive Models (GAMs)… are helpful for learning arbitrarily complex, nonlinear relationships between predictors andContinue readingWhat I Read: Generalized Additive Models

What I Read: Neural algorithmic reasoning

By Andrew Fairless on January 9, 2024November 7, 2023

https://thegradient.pub/neural-algorithmic-reasoning/ Neural algorithmic reasoningPetar Veličković14.Oct.2023 “…we will talk about classical computation… Think shortest path-finding, sorting, clever ways to break problems down… we will also investigate how to capture such computationContinue readingWhat I Read: Neural algorithmic reasoning

What I Read: Tiny Language Models

By Andrew Fairless on December 11, 2023October 12, 2023

https://www.quantamagazine.org/tiny-language-models-thrive-with-gpt-4-as-a-teacher-20231005/ Tiny Language Models Come of AgeBen Brubaker10/5/23 10:50 AM “To better understand how neural networks learn to simulate writing, researchers trained simpler versions on synthetic children’s stories.”

Tag: interpretability