http://www.offconvex.org/2020/10/21/intrinsicLR/ Mismatches between Traditional Optimization Analyses and Modern Deep LearningZhiyuan Li and Sanjeev AroraOct 21, 2020 “You may remember our previous blog post showing that it is possible to do
What I Read: Frameworks Scaling Deep Learning Training
https://medium.com/dataseries/microsoft-and-google-open-sourced-these-frameworks-based-on-their-work-scaling-deep-learning-c0510e907038 Microsoft and Google Open Sourced These Frameworks Based on Their Work Scaling Deep Learning TrainingGoogle and Microsoft have recently released new frameworks for distributed deep learning training.Jesus RodriguezOct 26
What I Read: Attention with Performers
https://ai.googleblog.com/2020/10/rethinking-attention-with-performers.html Rethinking Attention with PerformersFriday, October 23, 2020Posted by Krzysztof Choromanski and Lucy Colwell, Research Scientists, Google Research “To resolve these issues, we introduce the Performer, a Transformer architecture with