https://lilianweng.github.io/posts/2023-01-10-inference-optimization/
Large Transformer Model Inference Optimization
Lilian Weng
January 10, 2023
“The extremely high inference cost, in both time and memory, is a big bottleneck for adopting a powerful transformer for solving real-world tasks at scale.”