Class-Balanced Loss Based on Effective Number of Samples, Cui, Jia, Lin, Song, Belongie; 2019 - Summary
author: jordi1215
score: 9 / 10

1. Class Imbalance Problem

2. Effective Number of Samples

2.1 Definition

Intuitively, the more data, the better. However, since there is information overlap among data, as the number of samples increases, the marginal benefit a model can extract from the data diminishes.

For a class, N can be viewed as the number of unique prototypes.

Therefore, the effective number of samples is defined as the expected volume of samples.

The idea is to capture the diminishing marginal benefits by using more data points of a class. Due to intrinsic similarities among real-world data, as the number of samples grows, it is highly possible that a newly added sample is a near-duplicate of existing samples.

In addition, CNNs are trained with heavy data augmentations, all augmented examples are also considered as same with the original example.

2.2 Mathematical Formulation

Proposition (Effective Number):

\[E_n = \frac{(1−β^n)}{(1−β)}\]

where

\[β = \frac{(N− 1)}{N}\]

This proposition is proved by mathematical induction.

3. Class-Balanced Loss (CB Loss)

The class-balanced (CB) loss can be written as:

where\(n_y\)is the number of samples in the ground-truth class \(y\). The visualization is as follows where it depicts a function of \(n_y\) for different β

3.1 Class-Balanced Softmax Cross-Entropy Loss

Given a sample with class label y, the softmax cross-entropy (CE) loss for this sample is written as:

Suppose class y has \(n_y\) training samples, the class-balanced (CB) softmax cross-entropy loss is:

3.2 Class-Balanced Sigmoid Cross-Entropy Loss

The sigmoid cross-entropy (CE) loss can be written as:

The class-balanced (CB) sigmoid cross-entropy loss is:

3.3 Class-Balanced Focal Loss

The focal loss (FL) proposed in RetinaNet, reduce the relative loss for well-classified samples and focus on difficult samples:

The class-balanced (CB) Focal Loss is:

4 Experimental Results

4.1 Datasets

4.2 CIFAR Datasets

4.3 Large-Scale Datasets

TL;DR