https://transformer-circuits.pub/2022/toy_model/index.html Toy Models of SuperpositionNelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, Roger Grosse, Sam McCandlish, Jared Kaplan,
What I Read: Sparse Networks
https://www.quantamagazine.org/sparse-neural-networks-point-physicists-to-useful-data-20230608/ Sparse Networks Come to the Aid of Big PhysicsSteve NadisJune 8, 2023 “A novel type of neural network is helping physicists with the daunting challenge of data analysis.”
What I Read: Machines Learn, Teach Basics
https://www.quantamagazine.org/machines-learn-better-if-we-teach-them-the-basics-20230201/ Machines Learn Better if We Teach Them the BasicsMax G. LevyFebruary 1, 2023 “A wave of research improves reinforcement learning algorithms by pre-training them as if they were human.”
What I Read: Deep Learning Recommendation Models
https://www.kdnuggets.com/2021/04/deep-learning-recommendation-models-dlrm-deep-dive.html Deep Learning Recommendation Models (DLRM): A Deep DiveBy Nishant Kumar, Data Science Professional. “This deep dive article presents the architecture and deployment issues experienced with the deep learning recommendation
What I Read: Attention with Performers
https://ai.googleblog.com/2020/10/rethinking-attention-with-performers.html Rethinking Attention with PerformersFriday, October 23, 2020Posted by Krzysztof Choromanski and Lucy Colwell, Research Scientists, Google Research “To resolve these issues, we introduce the Performer, a Transformer architecture with