natural language processing – Page 6

What I Read: 1-bit LLMs, 1.58 Bits

By Andrew Fairless on May 14, 2024March 11, 2024

https://arxiv.org/abs/2402.17764 The Era of 1-bit LLMs: All Large Language Models are in 1.58 BitsShuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, JilongContinue readingWhat I Read: 1-bit LLMs, 1.58 Bits

What I Read: Mamba, Easy Way

By Andrew Fairless on May 9, 2024March 7, 2024

https://jackcook.com/2024/02/23/mamba.html Mamba: The Easy WayJack CookFebruary 23, 2024 “Mamba appears to outperform similarly-sized Transformers while scaling linearly with sequence length…. If… you’re looking for a higher-level overview of Mamba’s bigContinue readingWhat I Read: Mamba, Easy Way

What I Read: Structured State Space Sequence Models

By Andrew Fairless on April 30, 2024March 7, 2024

https://cnichkawde.github.io/statespacesequencemodels.html Beyond Transformers: Structured State Space Sequence ModelsChetan NichkawdeJanuary 22, 2024 “A new paradigm is rapidly evolving within the realm of sequence modeling that presents a marked advancement over theContinue readingWhat I Read: Structured State Space Sequence Models

What I Read: Forgetting Can Help AI Learn

By Andrew Fairless on April 29, 2024March 1, 2024

https://www.quantamagazine.org/how-selective-forgetting-can-help-ai-learn-better-20240228/ How Selective Forgetting Can Help AI Learn BetterAmos Zeeberg2/28/24 10:38 AM “Erasing key information during training results in machine learning models that can learn new languages faster and moreContinue readingWhat I Read: Forgetting Can Help AI Learn

What I Read: Predictive Human Preference, Model Ranking to Model Routing

By Andrew Fairless on April 25, 2024March 1, 2024

https://huyenchip.com//2024/02/28/predictive-human-preference.html Predictive Human Preference: From Model Ranking to Model RoutingChip Huyen2/27/24 7:00 PM “A challenge of building AI applications is choosing which model to use…. What if we can predictContinue readingWhat I Read: Predictive Human Preference, Model Ranking to Model Routing

What I Read: Compound AI Systems

By Andrew Fairless on April 22, 2024March 1, 2024

https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/ The Shift from Models to Compound AI SystemsMatei Zaharia, Omar Khattab, Lingjiao Chen, Jared Quincy Davis, Heather Miller, Chris Potts, James Zou, Michael Carbin, Jonathan Frankle, Naveen Rao, AliContinue readingWhat I Read: Compound AI Systems

What I Read: Scaling ChatGPT, Engineering Challenges

By Andrew Fairless on April 18, 2024March 1, 2024

https://newsletter.pragmaticengineer.com/p/scaling-chatgpt Scaling ChatGPT: Five Real-World Engineering ChallengesGergely OroszFeb 20, 2024 “Just one year after its launch, ChatGPT had more than 100M weekly users. In order to meet this explosive demand,Continue readingWhat I Read: Scaling ChatGPT, Engineering Challenges

What I Read: How Quickly LLMs Learn Skills?

By Andrew Fairless on April 10, 2024March 1, 2024

How Quickly Do Large Language Models Learn Unexpected Skills?Stephen Ornes2/13/24 10:32 AM “A new study suggests that so-called emergent abilities actually develop gradually and predictably, depending on how you measureContinue readingWhat I Read: How Quickly LLMs Learn Skills?

What I Read: LoRA from Scratch

By Andrew Fairless on April 4, 2024March 1, 2024

https://lightning.ai/lightning-ai/studios/code-lora-from-scratch Code LoRA from ScratchSebastian Raschka “LoRA, which stands for Low-Rank Adaptation, is a popular technique to finetune LLMs more efficiently,,,. This Studio explains how LoRA works by coding itContinue readingWhat I Read: LoRA from Scratch

Tag: natural language processing