https://towardsdatascience.com/understanding-llms-from-scratch-using-middle-school-math-e602d27ec876?gi=551c5bfd7f21 Understanding LLMs from Scratch Using Middle School MathRohit PatelOct 19, 2024 “In this article, we talk about how Large Language Models (LLMs) work, from scratch — assuming only that
What I Read: Transformers Inference Optimization
https://astralord.github.io/posts/transformer-inference-optimization-toolset Transformers Inference Optimization ToolsetAleksandr SamarinOct 1, 2024 “Large Language Models are pushing the boundaries of artificial intelligence, but their immense size poses significant computational challenges. As these models grow,
What I Watch: How LLMs store facts
How might LLMs store facts | Chapter 7, Deep Learning3Blue1BrownAug 31, 2024 “Unpacking the multilayer perceptrons in a transformer, and how they may store facts”
What I Read: Illustrated AlphaFold
https://elanapearl.github.io/blog/2024/the-illustrated-alphafold The Illustrated AlphaFoldElana Simon, Jake Silberg “A visual walkthrough of the AlphaFold3 architecture…”
What I Read: Transformers by Hand
https://towardsdatascience.com/deep-dive-into-transformers-by-hand-%EF%B8%8E-68b8be4bd813?gi=b2b3c1885179 Deep Dive into Transformers by HandSrijanie Dey, PhDApr 12, 2024 “…the two mechanisms that are truly the force behind the transformers are attention weighting and feed-forward networks (FFN).”
What I Read: Attention, transformers
Attention in transformers, visually explained | Chapter 6, Deep Learning3Blue1Brown “Demystifying attention, the key mechanism inside transformers and LLMs.”
What I Read: Mamba Explained
https://thegradient.pub/mamba-explained Mamba ExplainedKola Ayonrinde27.Mar.2024 “Mamba promises similar performance (and crucially similar scaling laws) as the Transformer whilst being feasible at long sequence lengths (say 1 million tokens).”