https://www.asapp.com/blog/reducing-the-high-cost-of-training-nlp-models-with-sru/
Reducing the High Cost of Training NLP Models With SRU++
By Tao Lei, PhD
Research Leader and Scientist at ASAPP
“The Transformer architecture was proposed to accelerate model training in NLP…. A couple of interesting questions arises following the development of Transformer: Is attention all we need for modeling? If recurrence is not a compute bottleneck, can we find better architectures?”