https://www.cs.princeton.edu/~smalladi/blog/2024/04/04/dataselection Using LESS Data to Tune Models: Data Selection in the Era of LLMsMengzhou Xia and Sadhika MalladiApril 04 2024 “We describe how data selection for modern-day LLMs differs from
What I Read: Transformers Training
https://www.borealisai.com/research-blogs/tutorial-17-transformers-iii-training/ Tutorial #17: Transformers III Training08/06/2021P. Xu, S. Prince “…we discuss challenges with transformer training dynamics and introduce some of the tricks that practitioners use to get transformers to converge.”