https://www.evanmiller.org/attention-is-off-by-one.html
Attention Is Off By One
By Evan Miller
July 24, 2023
“…the current generation of AI models have an off-by-one error in a crucial place, and it’s making everyone’s Transformer models needlessly difficult to compress and deploy.”