Abstract
Deep neural network models are robust to a limited amount of label noise, but
their ability to memorise noisy labels in high noise rate problems is still an
open issue. The most competitive noisy-label learning algorithms rely on a
2-stage process comprising an unsupervised learning to classify training
samples as clean or noisy, followed by a semi-supervised learning that
minimises the empirical vicinal risk (EVR) using a labelled set formed by
samples classified as clean, and an unlabelled set with samples classified as
noisy. In this paper, we hypothesise that the generalisation of such 2-stage
noisy-label learning methods depends on the precision of the unsupervised
classifier and the size of the training set to minimise the EVR. We empirically
validate these two hypotheses and propose the new 2-stage noisy-label training
algorithm LongReMix. We test LongReMix on the noisy-label benchmarks CIFAR-10,
CIFAR-100, WebVision, Clothing1M, and Food101-N. The results show that our
LongReMix generalises better than competing approaches, particularly in high
label noise problems. Furthermore, our approach achieves state-of-the-art
performance in most datasets. The code is available at
https://github.com/filipe-research/LongReMix.