transformer – Page 3 – Andrew Fairless, Ph.D.

What I Read: Chain-of-Thought Reasoning

By Andrew Fairless on June 3, 2024April 15, 2024

https://www.quantamagazine.org/how-chain-of-thought-reasoning-helps-neural-networks-compute-20240321 How Chain-of-Thought Reasoning Helps Neural Networks ComputeBen Brubaker3/21/24 11:15 AM “Large language models do better at solving problems when they show their work. Researchers are beginning to understand why.”

What I Read: 1-bit LLMs, 1.58 Bits

By Andrew Fairless on May 14, 2024March 11, 2024

https://arxiv.org/abs/2402.17764 The Era of 1-bit LLMs: All Large Language Models are in 1.58 BitsShuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, JilongContinue readingWhat I Read: 1-bit LLMs, 1.58 Bits

What I Read: Mamba, Easy Way

By Andrew Fairless on May 9, 2024March 7, 2024

https://jackcook.com/2024/02/23/mamba.html Mamba: The Easy WayJack CookFebruary 23, 2024 “Mamba appears to outperform similarly-sized Transformers while scaling linearly with sequence length…. If… you’re looking for a higher-level overview of Mamba’s bigContinue readingWhat I Read: Mamba, Easy Way

What I Read: Mamba

By Andrew Fairless on May 1, 2024March 7, 2024

https://jameschen.io/jekyll/update/2024/02/12/mamba.html Mamba No. 5 (A Little Bit Of…)James ChenFeb 12, 2024 “…I attempt to provide a walkthrough of the essence of the Mamba state space model architecture, occasionally sacrificing someContinue readingWhat I Read: Mamba

What I Read: Structured State Space Sequence Models

By Andrew Fairless on April 30, 2024March 7, 2024

https://cnichkawde.github.io/statespacesequencemodels.html Beyond Transformers: Structured State Space Sequence ModelsChetan NichkawdeJanuary 22, 2024 “A new paradigm is rapidly evolving within the realm of sequence modeling that presents a marked advancement over theContinue readingWhat I Read: Structured State Space Sequence Models

What I Read: Self-Attention in GPT

By Andrew Fairless on March 4, 2024January 25, 2024

https://twiecki.io/blog/2024/01/04/ An Intuitive Guide to Self-Attention in GPT: The Venetian MasqueradeThomas WieckiJanuary 4, 2024 “In AI, especially with something as intricate as self-attention, it’s easy to get lost in theContinue readingWhat I Read: Self-Attention in GPT

What I Read: Research Directions

By Andrew Fairless on February 21, 2024January 7, 2024

https://nlpnewsletter.substack.com/p/nlp-research-in-the-era-of-llms NLP Research in the Era of LLMs5 Key Research Directions Without Much ComputeSebastian RuderDec 19, 2023 “In an era where running state-of-the-art models requires a garrison of expensive GPUs,Continue readingWhat I Read: Research Directions

What I Read: Limits of Transformers on Compositionality

By Andrew Fairless on February 13, 2024December 19, 2023

https://arxiv.org/abs/2305.18654 Faith and Fate: Limits of Transformers on CompositionalityNouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Peter West, Chandra Bhagavatula, Ronan Le Bras, JenaContinue readingWhat I Read: Limits of Transformers on Compositionality

What I Read: Distributed Training, Finetuning

By Andrew Fairless on December 20, 2023November 7, 2023

https://sumanthrh.com/post/distributed-and-efficient-finetuning/ Everything about Distributed Training and Efficient FinetuningSumanth R HegdeLast updated on Oct 13, 2023 “practical guidelines and gotchas with multi-GPU and multi-node training”

Tag: transformer