https://arxiv.org/abs/2305.18654 Faith and Fate: Limits of Transformers on CompositionalityNouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena
What I Read: Distributed Training, Finetuning
https://sumanthrh.com/post/distributed-and-efficient-finetuning/ Everything about Distributed Training and Efficient FinetuningSumanth R HegdeLast updated on Oct 13, 2023 “practical guidelines and gotchas with multi-GPU and multi-node training”
What I Read: To Understand Transformers, Focus on Attention
https://drscotthawley.github.io/blog/posts/Transformers1-Attention.html To Understand Transformers, Focus on AttentionScott H. HawleyAugust 21, 2023 “To Understand Transformers, Focus on Attention”