Attention in transformers, visually explained | Chapter 6, Deep Learning3Blue1Brown “Demystifying attention, the key mechanism inside transformers and LLMs.”
What I Read: High-Dimensional Variance
https://gregorygundersen.com/blog/2023/12/09/covariance-matrices/ High-Dimensional VarianceGregory Gundersen09 December 2023 “A useful view of a covariance matrix is that it is a natural generalization of variance to higher dimensions.”
What I Read: Differentiable Trees
https://ericmjl.github.io/blog/2023/8/7/journal-club-differentiable-search-of-evolutionary-trees/ Journal Club: Differentiable Search of Evolutionary TreesEric J. Ma2023-08-07 “…how the authors take a non-differentiable problem and turn it into a differentiable problem through interconversion between mathematical data structures.”
What I Read: To Understand Transformers, Focus on Attention
https://drscotthawley.github.io/blog/posts/Transformers1-Attention.html To Understand Transformers, Focus on AttentionScott H. HawleyAugust 21, 2023 “To Understand Transformers, Focus on Attention”