attention – Andrew Fairless, Ph.D.

What I Read: Tensor Dimensions, Transformers

By Andrew Fairless on April 29, 2025January 28, 2025

https://huggingface.co/blog/not-lain/tensor-dims Mastering Tensor Dimensions in TransformersHafedh HichriJanuary 12, 2025 “Most generative AI models are built using a decoder-only architecture. In this blog post, we’ll explore a simple text generation model,Continue readingWhat I Read: Tensor Dimensions, Transformers

What I Read: LLMs, School Math

By Andrew Fairless on March 5, 2025November 16, 2024

https://towardsdatascience.com/understanding-llms-from-scratch-using-middle-school-math-e602d27ec876?gi=551c5bfd7f21 Understanding LLMs from Scratch Using Middle School MathRohit PatelOct 19, 2024 “In this article, we talk about how Large Language Models (LLMs) work, from scratch — assuming only thatContinue readingWhat I Read: LLMs, School Math

What I Read: Transformers Inference Optimization

By Andrew Fairless on January 27, 2025October 19, 2024

https://astralord.github.io/posts/transformer-inference-optimization-toolset Transformers Inference Optimization ToolsetAleksandr SamarinOct 1, 2024 “Large Language Models are pushing the boundaries of artificial intelligence, but their immense size poses significant computational challenges. As these models grow,Continue readingWhat I Read: Transformers Inference Optimization

What I Watch: How LLMs store facts

By Andrew Fairless on December 16, 2024September 3, 2024

How might LLMs store facts | Chapter 7, Deep Learning3Blue1BrownAug 31, 2024 “Unpacking the multilayer perceptrons in a transformer, and how they may store facts”

What I Read: Improving Language Models, Practical Size

By Andrew Fairless on October 15, 2024July 14, 2024

https://amaarora.github.io/posts/2024-07-07%20Gemma.html Gemma 2, Improving Open Language Models at a Practical SizeAman AroraJuly 9, 2024 “…we take a deep dive into the architectural components of Gemma 2 such as Grouped QueryContinue readingWhat I Read: Improving Language Models, Practical Size

What I Read: Illustrated AlphaFold

By Andrew Fairless on October 9, 2024July 14, 2024

https://elanapearl.github.io/blog/2024/the-illustrated-alphafold The Illustrated AlphaFoldElana Simon, Jake Silberg “A visual walkthrough of the AlphaFold3 architecture…”

What I Read: Sliding Window Attention

By Andrew Fairless on September 30, 2024July 14, 2024

https://amaarora.github.io/posts/2024-07-04%20SWA.html Sliding Window Attention, Longformer – The Long-Document TransformerAman AroraJuly 4, 2024 “…we will look take a deep dive into Sliding Window Attention (SWA) that was introduced as part ofContinue readingWhat I Read: Sliding Window Attention

What I Read: Transformers by Hand

By Andrew Fairless on August 14, 2024May 25, 2024

https://towardsdatascience.com/deep-dive-into-transformers-by-hand-%EF%B8%8E-68b8be4bd813?gi=b2b3c1885179 Deep Dive into Transformers by HandSrijanie Dey, PhDApr 12, 2024 “…the two mechanisms that are truly the force behind the transformers are attention weighting and feed-forward networks (FFN).”

What I Read: Ring Attention

By Andrew Fairless on July 8, 2024April 23, 2024

https://coconut-mode.com/posts/ring-attention Ring Attention ExplainedKilian Haefeli, Simon Zirui Guo, Bonnie Li10 Apr 2024 “Context length in Large Language Models has expanded rapidly…. What if we we could use multiple devices toContinue readingWhat I Read: Ring Attention

What I Read: Attention, transformers

By Andrew Fairless on June 18, 2024April 16, 2024

Attention in transformers, visually explained | Chapter 6, Deep Learning3Blue1Brown “Demystifying attention, the key mechanism inside transformers and LLMs.”

Tag: attention