ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky, Sutskever, Hinton; 2012 - Summary
author: philkr
score: 10 / 10

This is the paper that started the deep learning revolution in 2012. From a core technical level, the architecture looks quite similar to LeNet-5, just bigger. AlexNet

There are a few core difference when scaling up the architecture:

By itself the AlexNet architecture likely didn’t train, and required a fairly large bag of tricks (accumulated over the past 12 years: 2000-2012) to work:

Most of the architectural changes and training tricks were well known in the deep learning literature by 2012. Where the paper really shines is the performance and experiments. The paper showed for the first time that deep networks can easily outperform classical methods and hand-designed features in computer vision tasks. The imagenet result shows a huge improvement over the prior SOTA (up to 10% improvement in top-5 accuracy). Within a year almost all of computer vision switched to AlexNet-like architectures.

TL;DR