What I Read: Improving Language Models, Practical Size

https://amaarora.github.io/posts/2024-07-07%20Gemma.html

Gemma 2, Improving Open Language Models at a Practical Size
Aman Arora
July 9, 2024


“…we take a deep dive into the architectural components of Gemma 2 such as Grouped Query Attention, Sliding Window Attention, RoPE Embeddings, Logit soft-capping & Model-merging!”