What I Read: smaller LLMs, more tokens


Go smol or go home
Why we should train smaller LLMs on more tokens
Harm de Vries
Apr 13, 2023

“However, for most use cases you should not train a compute-optimal LLM but instead spend some extra compute to obtain a smaller model. Smaller models not only make inference faster and cheaper, they are also much easier to use for developers and researchers with limited GPU resources.”