https://amaarora.github.io/posts/2024-07-07%20Gemma.html Gemma 2, Improving Open Language Models at a Practical SizeAman AroraJuly 9, 2024 “…we take a deep dive into the architectural components of Gemma 2 such as Grouped Query
What I Read: Illustrated AlphaFold
https://elanapearl.github.io/blog/2024/the-illustrated-alphafold The Illustrated AlphaFoldElana Simon, Jake Silberg “A visual walkthrough of the AlphaFold3 architecture…”
What I Read: Transformers by Hand
https://towardsdatascience.com/deep-dive-into-transformers-by-hand-%EF%B8%8E-68b8be4bd813?gi=b2b3c1885179 Deep Dive into Transformers by HandSrijanie Dey, PhDApr 12, 2024 “…the two mechanisms that are truly the force behind the transformers are attention weighting and feed-forward networks (FFN).”
What I Read: Attention, transformers
Attention in transformers, visually explained | Chapter 6, Deep Learning3Blue1Brown “Demystifying attention, the key mechanism inside transformers and LLMs.”
What I Read: Mamba Explained
https://thegradient.pub/mamba-explained Mamba ExplainedKola Ayonrinde27.Mar.2024 “Mamba promises similar performance (and crucially similar scaling laws) as the Transformer whilst being feasible at long sequence lengths (say 1 million tokens).”
What I Read: Chain-of-Thought Reasoning
https://www.quantamagazine.org/how-chain-of-thought-reasoning-helps-neural-networks-compute-20240321 How Chain-of-Thought Reasoning Helps Neural Networks ComputeBen Brubaker3/21/24 11:15 AM “Large language models do better at solving problems when they show their work. Researchers are beginning to understand why.”