What I Read: Transformers for Image Recognition

https://medium.com/swlh/an-image-is-worth-16×16-words-transformers-for-image-recognition-at-scale-brief-review-of-the-8770a636c6a8

An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale (Brief Review of the ICLR 2021 Paper)
Stan Kriventsov
Oct 9

“The reason attention models haven’t been doing better until now in computer vision lies both in the difficulty of scaling them… and… individual pixels in a picture are not very meaningful by themselves… The new paper suggests the approach of using attention not on pixels, but instead on small patches of the image…. it is more efficient than convolutional approaches in terms of achieving the same accuracy of prediction with less computation…”