What I Read: Attention Off By One

https://www.evanmiller.org/attention-is-off-by-one.html

Attention Is Off By One
By Evan Miller
July 24, 2023


“…the current generation of AI models have an off-by-one error in a crucial place, and it’s making everyone’s Transformer models needlessly difficult to compress and deploy.”