What I Read: Mamba Explained

https://thegradient.pub/mamba-explained

Mamba Explained
Kola Ayonrinde
27.Mar.2024


“Mamba promises similar performance (and crucially similar scaling laws) as the Transformer whilst being feasible at long sequence lengths (say 1 million tokens).”