https://arxiv.org/abs/1912.02292 Deep Double Descent: Where Bigger Models and More Data HurtPreetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever “We show that a variety of modern deep
What I Read: Transformers for Image Recognition
https://medium.com/swlh/an-image-is-worth-16×16-words-transformers-for-image-recognition-at-scale-brief-review-of-the-8770a636c6a8 An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale (Brief Review of the ICLR 2021 Paper)Stan KriventsovOct 9 “The reason attention models haven’t been doing better
What I Read: Transformer Architecture
https://blog.exxactcorp.com/a-deep-dive-into-the-transformer-architecture-the-development-of-transformer-models/ Deep LearningA Deep Dive Into the Transformer Architecture – The Development of Transformer ModelsMarketing, July 14, 2020 0 11 min readTransformers for Natural Language Processing “There’s no better time
What I Read: Progress of Natural Language Processing
https://blog.exxactcorp.com/the-unreasonable-progress-of-deep-neural-networks-in-natural-language-processing-nlp/ Deep LearningThe Unreasonable Progress of Deep Neural Networks in Natural Language Processing (NLP)Marketing, June 2, 2020 0 14 min read “With the advent of pre-trained generalized language models, we
What I Read: Reformer efficient Transformer
https://towardsdatascience.com/illustrating-the-reformer-393575ac6ba0?gi=34b920510f6f Illustrating the ReformerThe efficient TransformerAlireza DirafzoonFeb 4 “Recently, Google introduced the Reformer architecture, a Transformer model designed to efficiently handle processing very long sequences of data (e.g. up to