https://arxiv.org/abs/2402.17764 The Era of 1-bit LLMs: All Large Language Models are in 1.58 BitsShuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, JilongContinue readingWhat I Read: 1-bit LLMs, 1.58 Bits
https://sander.ai/2024/02/28/paradox.html The paradox of diffusion distillationSander DielemanFebruary 28, 2024 “…let’s take a closer look at the various ways in which the number of sampling steps required to get good resultsContinue readingWhat I Read: diffusion distillation
https://jackcook.com/2024/02/23/mamba.html Mamba: The Easy WayJack CookFebruary 23, 2024 “Mamba appears to outperform similarly-sized Transformers while scaling linearly with sequence length…. If… you’re looking for a higher-level overview of Mamba’s bigContinue readingWhat I Read: Mamba, Easy Way
https://jameschen.io/jekyll/update/2024/02/12/mamba.html Mamba No. 5 (A Little Bit Of…)James ChenFeb 12, 2024 “…I attempt to provide a walkthrough of the essence of the Mamba state space model architecture, occasionally sacrificing someContinue readingWhat I Read: Mamba
https://cnichkawde.github.io/statespacesequencemodels.html Beyond Transformers: Structured State Space Sequence ModelsChetan NichkawdeJanuary 22, 2024 “A new paradigm is rapidly evolving within the realm of sequence modeling that presents a marked advancement over theContinue readingWhat I Read: Structured State Space Sequence Models
https://www.quantamagazine.org/how-selective-forgetting-can-help-ai-learn-better-20240228/ How Selective Forgetting Can Help AI Learn BetterAmos Zeeberg2/28/24 10:38 AM “Erasing key information during training results in machine learning models that can learn new languages faster and moreContinue readingWhat I Read: Forgetting Can Help AI Learn
https://ayandas.me/2024/blog/diffusion-theory-from-scratch/ Building Diffusion Model’s theory from ground upAyan DasMay 7, 2024 “…we will go back and revisit the ‘fundamental ingredients’ behind the SDE formulation, and show how the idea canContinue readingWhat I Read: Diffusion Model theory
https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/ The Shift from Models to Compound AI SystemsMatei Zaharia, Omar Khattab, Lingjiao Chen, Jared Quincy Davis, Heather Miller, Chris Potts, James Zou, Michael Carbin, Jonathan Frankle, Naveen Rao, AliContinue readingWhat I Read: Compound AI Systems
https://newsletter.pragmaticengineer.com/p/scaling-chatgpt Scaling ChatGPT: Five Real-World Engineering ChallengesGergely OroszFeb 20, 2024 “Just one year after its launch, ChatGPT had more than 100M weekly users. In order to meet this explosive demand,Continue readingWhat I Read: Scaling ChatGPT, Engineering Challenges