Abstract
This thesis advances the field of adversarial robustness in Deep Neural Networks (DNNs) for image classification. While DNNs achieve remarkable classification performance, their vulnerability to imperceptible adversarial perturbations that cause misclassifications remains a critical limitation. Adversarial training is currently the most effective method to defend against adversarial perturbations by providing more comprehensive protection. It improves the inherent robustness of the DNN during the training stage by exposing the model to adversarial examples and teaching it to classify them correctly. Other methods could not solve adversarial examples directly. Adversarial detections could detect an adversarial example, then reject it or process it. However, rejecting inputs may not be applicable under all conditions (e.g., safety-critical systems), and processing is a more complex problem. Data augmentation can improve the generalisation and robustness marginally. Adversarial training also has its drawbacks: lower clean accuracy and higher training cost. Our work introduces frameworks that improve model robustness and generalisation capabilities.
Firstly, we address the inherent limitations of the Cross-Entropy (CE) loss in adversarial training by developing a distance-based feature-adjusting framework. Recognising that the CE loss gradients align with perturbation-maximising directions, we propose the Extended Centre Loss (ECL) to enforce intra-class compactness and inter-class separability of the learned network. This approach simultaneously reduces embedding discrepancies between clean and adversarial examples, and disentangles overlapping class distributions, achieving superior robustness without compromising clean accuracy.
Secondly, existing adversarial training methods typically overlook the multi-modal nature of data distributions, which could negatively influence the performance of the trained model. To address this issue, a novel adversarial training framework, namely Adaptive Clustering Centre Alignment (ACCA), is proposed in this thesis. ACCA dynamically clusters each class into multiple clusters, constructing finer decision boundaries through a new loss. This framework demonstrates that explicitly accounting for multi-modality in feature space significantly enhances adversarial robustness compared to conventional approaches.
Lastly, we utilise adversarial training methods to improve the standard accuracy. The first approach is to balance the long-tailed data distribution that leads to underfitting on minority classes. Although this biased problem has been extensively studied by the research community, existing approaches mainly focus on the class-wise (inter-class) imbalance problems. To this end, we present a method that addresses both inter-class and intra-class imbalance by strategically injecting minority features into majority samples via adversarial perturbations. Furthermore, we introduce a framework, namely Mixup Propagation (MixProp), that adopts multiple Batch Normalisation (BN) layers to process clean and adversarial examples separately. MixProp effectively enhances the performance of DNNs in classifying both clean and distorted images with reduced training cost compared to standard adversarial training approaches.