Densely Connected Convolutional Networks, Huang, Liu, Maaten, Weinberger; 2016 - Summary
author: philkr
score: 8 / 10

The paper introduces DenseNet. The core observation underlying the paper is that residual networks slowly update their features. As a consequence of this many residual block learn similar filters and interpret the same patterns over and over again. DenseNet proposes a very simple solution to this: Reuse outputs of previous layers.

Instead of adding the outputs of a residual connection to the feature map, DenseNet concatenates it along the channel dimension. The figure below shows an example:

Since each layer receives the features of all preceding layers within a block, the network can use fewer output channels (per-layer). This leads to more compact and networks. Compared to ResNets a DenseNet uses up to \(3\times\) fewer parameters for the same performance.

DenseNet then combines multiple ‘blocks’ of densely connected layers in a network. In between blocks, DenseNet uses a ‘transition’ layer (1x1 conv and average pooling) that reduces the channel dimension.

One issue with DenseNets is that a naive implementation uses a lot of memory during training. It has thus seen little generalization beyond ImageNet.

TL;DR