What I Read: How to Train Really Large Models

https://lilianweng.github.io/lil-log/2021/09/24/train-large-neural-networks.html

How to Train Really Large Models on Many GPUs?
Sep 24, 2021
Lilian Weng


“How to train large and deep neural networks is challenging, as it demands a large amount of GPU memory and a long horizon of training time…. There are several parallelism paradigms to enable model training across multiple GPUs, as well as a variety of model architecture and memory saving designs…”