What I Read: smaller LLMs, more tokens

https://www.harmdevries.com/post/model-size-vs-compute-overhead/

Go smol or go home
Why we should train smaller LLMs on more tokens
Harm de Vries
Apr 13, 2023


“However, for most use cases you should not train a compute-optimal LLM but instead spend some extra compute to obtain a smaller model. Smaller models not only make inference faster and cheaper, they are also much easier to use for developers and researchers with limited GPU resources.”