Abstract
We propose a new training algorithm, ScanMix, that explores semantic
clustering and semi-supervised learning (SSL) to allow superior robustness to
severe label noise and competitive robustness to non-severe label noise
problems, in comparison to the state of the art (SOTA) methods. ScanMix is
based on the expectation maximisation framework, where the E-step estimates the
latent variable to cluster the training images based on their appearance and
classification results, and the M-step optimises the SSL classification and
learns effective feature representations via semantic clustering. We present a
theoretical result that shows the correctness and convergence of ScanMix, and
an empirical result that shows that ScanMix has SOTA results on CIFAR-10/-100
(with symmetric, asymmetric and semantic label noise), Red Mini-ImageNet (from
the Controlled Noisy Web Labels), Clothing1M and WebVision. In all benchmarks
with severe label noise, our results are competitive to the current SOTA.