Abstract
The most competitive noisy label learning methods rely on an unsupervised
classification of clean and noisy samples, where samples classified as noisy
are re-labelled and "MixMatched" with the clean samples. These methods have two
issues in large noise rate problems: 1) the noisy set is more likely to contain
hard samples that are in-correctly re-labelled, and 2) the number of samples
produced by MixMatch tends to be reduced because it is constrained by the small
clean set size. In this paper, we introduce the learning algorithm PropMix to
handle the issues above. PropMix filters out hard noisy samples, with the goal
of increasing the likelihood of correctly re-labelling the easy noisy samples.
Also, PropMix places clean and re-labelled easy noisy samples in a training set
that is augmented with MixUp, removing the clean set size constraint and
including a large proportion of correctly re-labelled easy noisy samples. We
also include self-supervised pre-training to improve robustness to high noisy
label scenarios. Our experiments show that PropMix has state-of-the-art (SOTA)
results on CIFAR-10/-100(with symmetric, asymmetric and semantic label noise),
Red Mini-ImageNet (from the Controlled Noisy Web Labels), Clothing1M and
WebVision. In severe label noise bench-marks, our results are substantially
better than other methods. The code is available
athttps://github.com/filipe-research/PropMix.