What I Read: Autoencoders, Interpretability

https://adamkarvonen.github.io/machine_learning/2024/06/11/sae-intuitions.html

An Intuitive Explanation of Sparse Autoencoders for LLM Interpretability
Adam Karvonen
Jun 11, 2024


“Sparse Autoencoders (SAEs) have recently become popular for interpretability of machine learning models…”