author:  DartingMelody 
score:  9 / 10 
What is the core idea?
The central idea of the paper is to train multiple detection heads with multiple intersection over union (IoU) thresholds. The cascade RCNN architecture solves two problems, overfitting during training and inferencetime mismatch between the IoU for which the detector is optimal versus IoU of the input hypothesis.
How is it realized (technically)?
The model consists of sequence of detectors trained with increasing IoU thresholds with the output of the previous detector fed into the next as a resampling mechanism with no discrepancy between training and inference.

CascadeRCNN extends the twostage architecture of fasterRCNN relying on a cascade of specialized regressors where T is the total number of cascade stages. Each regressor \(f_{t}\) in the cascade is optimized with respect to the sample distribution {\(b^{t}\)} arriving at the t stage, instead of initial distribution \(b^{t}\) .

At each stage t, the RCNN has a classifier \(h_{t}\) and regressor \(f_{t}\) which is optimized for IoU threshold \(u^{t}\), where \(u^{t}>u^{t1}\) .

This is achieved by minimizing the above loss where \(b^{t} = f_{t1}(x^{t1}, b^{t1})\), g is the ground truth object for \(x^{t}\), λ = 1 is the tradeoff coefficient and \(y^{t}\) is the label of \(x^{t}\) given by

Cascade RCNN has four stages, one region proposal network (RPN) and three stages for detection with U = {0.5, 0.6, 0.7}. (These were the most common IoU thresholds used by the authors unless specified.)
How well does the paper perform?
 The Cascade RCNN, based on FPN+ and ResNet101 outperforms all the earlier state of the art single model detectors like FasterRCNN, YOLO, MaskRCNN etc on the COCO dataset. It also outperforms Iterative BBox and Integral Loss models. The difference in result is more visible with higher IoU.
 As the computational cost of adding more detection head in the architecture is usually small, when compared to RPN, the computational overhead of RCNN is small for both training and testing.
What interesting variants are explored?
Various architectures like FasterRCNN, RFCN with ResNet50 and ResNet101 backbone, FPN+ with ResNet50 and ResNet101 backbone are trained with and without cascading. The cascade variants of all these models outperforms the corresponding noncascade models. Other ablation experiments were performed on :
 IoU thresholds with the result that the detector can be more selective against close false positives and specialize for more precise hypothesis.
 stagewise comparison with the result that the ensemble of all classifiers is the best generally.
 regression statistics with the result that it helps the effective multitask learning of classification and regression.
 number of stages with the result that the three stage cascade (the cascadeRCNN model ) achieves the best tradeoff.
TL;DR
 The cascadeRCNN model which extends FasterRCNN consists of sequence of detectors trained with increasing IoU thresholds.
 The model aims to reduce overfitting, match inference and training architecture, and detect true positives while supressing close false positives.
 The CascadeRCNN model outperforms all the previous state of the art models like Fast RCNN, YOLO, MaskRCNN, etc on COCO dataset.