Training data-efficient image transformers & distillation through attention, Touvron, Cord, Douze, Massa, Sablayrolles, Jégou; 2020 - Summary
author: kelseyball
score: 6 / 10

Main contributions

Background

Vision Transfomers

Knowledge Distillation

Their model: Data-efficient image Transformers (DeiT)

New distillation method: “Hard-Label Distillation”

Teacher models

Results

Efficiency vs. accuracy for their model

Transfer learning

Variants explored

Table 8 below shows the results of their ablation study on DeiT:

TL;DR