Abstract
Data augmentation is an essential part of the training process applied to
deep learning models. The motivation is that a robust training process for deep
learning models depends on large annotated datasets, which are expensive to be
acquired, stored and processed. Therefore a reasonable alternative is to be
able to automatically generate new annotated training samples using a process
known as data augmentation. The dominant data augmentation approach in the
field assumes that new training samples can be obtained via random geometric or
appearance transformations applied to annotated training samples, but this is a
strong assumption because it is unclear if this is a reliable generative model
for producing new training samples. In this paper, we provide a novel Bayesian
formulation to data augmentation, where new annotated training points are
treated as missing variables and generated based on the distribution learned
from the training set. For learning, we introduce a theoretically sound
algorithm --- generalised Monte Carlo expectation maximisation, and demonstrate
one possible implementation via an extension of the Generative Adversarial
Network (GAN). Classification results on MNIST, CIFAR-10 and CIFAR-100 show the
better performance of our proposed method compared to the current dominant data
augmentation approach mentioned above --- the results also show that our
approach produces better classification results than similar GAN models.