https://transformer-circuits.pub/2022/toy_model/index.html Toy Models of SuperpositionNelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, Roger Grosse, Sam McCandlish, Jared Kaplan,
What I Read: How Machines ‘Grok’ Data
https://www.quantamagazine.org/how-do-machines-grok-data-20240412 How Do Machines ‘Grok’ Data?Anil Ananthaswamy4/12/24 “By apparently overtraining them, researchers have seen neural networks discover novel solutions to problems.”
What I Read: Tiny Language Models
https://www.quantamagazine.org/tiny-language-models-thrive-with-gpt-4-as-a-teacher-20231005/ Tiny Language Models Come of AgeBen Brubaker10/5/23 10:50 AM “To better understand how neural networks learn to simulate writing, researchers trained simpler versions on synthetic children’s stories.”