End-to-End Object Detection with Transformers, Carion, Massa, Synnaeve, Usunier, Kirillov, Zagoruyko; 2020 - Summary
author: chengchunhsu
score: 7 / 10

What is the core idea?

Core idea: transform object detection problem as a direct set prediction problem

Motivation: avoid hand-designed components in modern object detectors

Solution: DEtection TRansformer (DETR)

How is it realized (technically)?

image-20211022151426913

Key component:

Set prediction loss

DETR infers a fixed-size set of N predictions

Goal: score predicted objects (class, position, size) with respect to the ground truth

How:

Match loss:

image-20211022151454271

Hungarian loss:

image-20211022151507583

Bounding box prediction loss:

image-20211022151533727

Transformer architecture

image-20211022151602124

How well does the paper perform?

Performance on COCO

Discussion: how does DETR perform comparing to other object detection method, e.g., RetinaNet and Faster R-CNN?

image-20211024204904023

Ablation studies

Discussion #1: what is the importance of each decoder layer?

image-20211022151701535

Discussion #2: what is the FFN?

Generalization to unseen numbers of instances

Discussion: does DETR generalize to unseen numbers of instances?

image-20211022151713958

TL;DR