author: DartingMelody
score: 9 / 10

What is the core idea?

The central idea of the paper is to train multiple detection heads with multiple intersection over union (IoU) thresholds. The cascade R-CNN architecture solves two problems, overfitting during training and inference-time mismatch between the IoU for which the detector is optimal versus IoU of the input hypothesis.

Problem example

How is it realized (technically)?

The model consists of sequence of detectors trained with increasing IoU thresholds with the output of the previous detector fed into the next as a resampling mechanism with no discrepancy between training and inference.

Architecture

How well does the paper perform?

Results

What interesting variants are explored?

Various architectures like Faster-RCNN, R-FCN with ResNet-50 and ResNet-101 backbone, FPN+ with ResNet-50 and ResNet-101 backbone are trained with and without cascading. The cascade variants of all these models outperforms the corresponding non-cascade models. Other ablation experiments were performed on :

TL;DR