https://probablymarcus.com/blocks/2023/10/19/vectorizing-wide-pytorch-expressions.html What happens when you vectorize wide PyTorch expressions?Marcus Lewis19 October 2023 “So we see that vectorization has three effects:It lets us keep the GPU busy even when inputs are
What I Read: Distributed Training, Finetuning
https://sumanthrh.com/post/distributed-and-efficient-finetuning/ Everything about Distributed Training and Efficient FinetuningSumanth R HegdeLast updated on Oct 13, 2023 “practical guidelines and gotchas with multi-GPU and multi-node training”