Gradient-based learning applied to document recognition, LeCun, Bottun, Bengio, Haffner; 1998 - Summary
author: philkr
score: 10 / 10

This is one of those papers that’s well ahead of its time. The paper explores the use ConvNets for optical character recognition, and graph neural networks (called graph transformer, not to be confused with the new fancy transformers) for text recognition. At 40+ pages it’s a true beast of a paper, but most of the space is devoted to a gentle introduction to deep learning.

There are two core ideas in the paper: The new LeNet architecture for digit classification, a graph transformer for streaming text classification.


The LeNet-5 architecture (shown above) is quite simple:

LeNet-5 contains some interesting details. The authors realized that the parameters of CNNs yield many symmetries. They propose to use a (sparse) banded structure in the weights of the second convolutional layer to break these symmetries (see figure below). This didn’t survive into the current era. Modern CNNs use random initializations to break these symmetries.


The original LeNet-5 used a Euclidean-distance-based classifier. However, moderns CNNs a Softmax is preferred.

The second part of the paper focuses on graph transformer to recognize streaming digits. It jointly segments and classifies the digits.


This is the less-developed part of the paper. It does have some nice nuggets though. For example, LeNet-5 is applied in a streaming manner (fully-convolutional; hello FCNs).

In terms of evaluation, there is not all that much there, except for some MNIST results.