https://towardsdatascience.com/deep-dive-into-transformers-by-hand-%EF%B8%8E-68b8be4bd813?gi=b2b3c1885179 Deep Dive into Transformers by HandSrijanie Dey, PhDApr 12, 2024 “…the two mechanisms that are truly the force behind the transformers are attention weighting and feed-forward networks (FFN).”
What I Read: Attention, transformers
Attention in transformers, visually explained | Chapter 6, Deep Learning3Blue1Brown “Demystifying attention, the key mechanism inside transformers and LLMs.”
What I Read: Linear Algebra, Random
https://youtu.be/6htbyY3rH1w?si=IXTrcoIReps_ftFq Is the Future of Linear Algebra.. Random?Mutual Information “Randomization is arguably the most exciting and innovative idea to have hit linear algebra in a long time.”
What I Read: Mamba Explained
https://thegradient.pub/mamba-explained Mamba ExplainedKola Ayonrinde27.Mar.2024 “Mamba promises similar performance (and crucially similar scaling laws) as the Transformer whilst being feasible at long sequence lengths (say 1 million tokens).”
What I Read: High-Dimensional Variance
https://gregorygundersen.com/blog/2023/12/09/covariance-matrices/ High-Dimensional VarianceGregory Gundersen09 December 2023 “A useful view of a covariance matrix is that it is a natural generalization of variance to higher dimensions.”