https://huggingface.co/blog/optimize-llm
Optimizing your LLM in production
September 15, 2023
Patrick von Platen
“…efficient LLM deployment…. pros and cons of adopting lower precision, provide a comprehensive exploration of the latest attention algorithms, and discuss improved LLM architectures.”