On the importance of initialization and momentum in deep learning, Sutskever, Martens, Dahl, Hinton; 2013 - Summary
score: 8 / 10

Core idea

SGD with momentum and careful initialization can train DNN’s, RNN’s in practice


Main contributions

Momentum methods: Classical Momentum (CM), Polyak 1964 and Nesterov’s Accelerated Gradient (NAG), Nesterov 1983

Experiments on Deep Autoencoders

Discussion of momentum scheduling

Experiments on RNNs