https://thegradient.pub/mamba-explained Mamba ExplainedKola Ayonrinde27.Mar.2024 “Mamba promises similar performance (and crucially similar scaling laws) as the Transformer whilst being feasible at long sequence lengths (say 1 million tokens).”
What I Read: Chain-of-Thought Reasoning
https://www.quantamagazine.org/how-chain-of-thought-reasoning-helps-neural-networks-compute-20240321 How Chain-of-Thought Reasoning Helps Neural Networks ComputeBen Brubaker3/21/24 11:15 AM “Large language models do better at solving problems when they show their work. Researchers are beginning to understand why.”
What I Read: Geometric Deep Learning
https://thegradient.pub/towards-geometric-deep-learning/ Towards Geometric Deep LearningMichael Bronstein18.Feb.2023 “Geometric Deep Learning is an umbrella term for approaches considering a broad class of ML problems from the perspectives of symmetry and invariance.”
What I Read: Realtime User Actions in Recommendation
https://medium.com/pinterest-engineering/how-pinterest-leverages-realtime-user-actions-in-recommendation-to-boost-homefeed-engagement-volume-165ae2e8cde8 How Pinterest Leverages Realtime User Actions in Recommendation to Boost Homefeed Engagement VolumeXue Xia, Software Engineer, Homefeed Ranking; Neng Gu, Software Engineer, Content & User Understanding; Dhruvil Deven Badani,