Can a neural network train other networks?
An introduction to knowledge distillation
Tivadar Danka
Oct 5
“Now you have a huge model, which, although performs excellently, there is no way to deploy it into production and get predictions in a reasonable timeā¦. What if we use the predictions from the large and cumbersome model to train a smaller, so-called student model to approximate the big one?”