Abstract
Label noise is common in large real-world datasets, and its presence harms
the training process of deep neural networks. Although several works have
focused on the training strategies to address this problem, there are few
studies that evaluate the impact of data augmentation as a design choice for
training deep neural networks. In this work, we analyse the model robustness
when using different data augmentations and their improvement on the training
with the presence of noisy labels. We evaluate state-of-the-art and classical
data augmentation strategies with different levels of synthetic noise for the
datasets MNist, CIFAR-10, CIFAR-100, and the real-world dataset Clothing1M. We
evaluate the methods using the accuracy metric. Results show that the
appropriate selection of data augmentation can drastically improve the model
robustness to label noise, increasing up to 177.84% of relative best test
accuracy compared to the baseline with no augmentation, and an increase of up
to 6% in absolute value with the state-of-the-art DivideMix training strategy.