Three mysteries in deep learning: Ensemble, knowledge distillation, and self-distillation
Published January 19, 2021
By Zeyuan Allen-Zhu , Senior Researcher Yuanzhi Li , Assistant Professor, Carnegie Mellon University
“…besides this small deviation in test accuracies, do the neural networks trained from different random initializations actually learn very different functions? If so, where does the discrepancy come from? How do we reduce such discrepancy and make the neural network more stable or even better? These questions… relate to the mysteries of three techniques widely used in deep learning.”