https://amaarora.github.io/posts/2024-07-04%20SWA.html Sliding Window Attention, Longformer – The Long-Document TransformerAman AroraJuly 4, 2024 “…we will look take a deep dive into Sliding Window Attention (SWA) that was introduced as part of
What I Read: Transformers by Hand
https://towardsdatascience.com/deep-dive-into-transformers-by-hand-%EF%B8%8E-68b8be4bd813?gi=b2b3c1885179 Deep Dive into Transformers by HandSrijanie Dey, PhDApr 12, 2024 “…the two mechanisms that are truly the force behind the transformers are attention weighting and feed-forward networks (FFN).”
What I Read: Attention, transformers
Attention in transformers, visually explained | Chapter 6, Deep Learning3Blue1Brown “Demystifying attention, the key mechanism inside transformers and LLMs.”
What I Read: Linear Algebra, Random
https://youtu.be/6htbyY3rH1w?si=IXTrcoIReps_ftFq Is the Future of Linear Algebra.. Random?Mutual Information “Randomization is arguably the most exciting and innovative idea to have hit linear algebra in a long time.”
What I Read: Mamba Explained
https://thegradient.pub/mamba-explained Mamba ExplainedKola Ayonrinde27.Mar.2024 “Mamba promises similar performance (and crucially similar scaling laws) as the Transformer whilst being feasible at long sequence lengths (say 1 million tokens).”
What I Read: High-Dimensional Variance
https://gregorygundersen.com/blog/2023/12/09/covariance-matrices/ High-Dimensional VarianceGregory Gundersen09 December 2023 “A useful view of a covariance matrix is that it is a natural generalization of variance to higher dimensions.”