Abstract
This thesis presents novel contributions to the field of attribute-aware face recognition, addressing fundamental challenges in efficiency, bias, and integration of attributes with identity features. Our work spans three main areas, each contributing to the overarching goal of enhancing the robustness and unbiased performance of the face recognition system, across various attribute cohorts, while managing computational complexity.
First, we reduce the efficiency compromise of modern anchor-based face detectors, which, despite their effectiveness, suffer from computational complexity due to redundant feature extraction and unreliable decision-making processes. We introduce a Heatmap-assisted Spatial Attention (HSA) module and a Scale-aware Layer Attention (SLA) module, which significantly reduces computational costs by adaptively focusing on informative features and employing a spatial feature selection that highlights facial areas. Our approach, which combines heatmap scores with classification results for decision-making, demonstrates a notable reduction in efficiency compromise and reliability on well-known benchmarks.
Second, recognising the importance of attribute awareness in face recognition, we address the challenge of feature disentanglement to mitigate potential performance biases among identities with different attributes. By leveraging the Nearest neighbours Proxy Triple (NPT) loss and introducing an innovative Adaptive-rank NPT loss, we achieve a natural separation of identity and attribute features, enhancing both accuracy and unbiased performance across different gender and ethnicity categories. Our method, namely Ada2NPT loss, outperforms the state-of-the-art losses by promoting inter-class separability and intra-class compactness, as evidenced by our experiments on several benchmarking datasets.
Last, we propose a novel approach to overcome the limitations of traditional representation learning, which struggles with integrating multiple attribute features with identity features due to discretised labels and attribute prediction fallibility. Utilising a pre-trained vision-language model, we transfer facial attributes into prompts for extracting embeddings, thereby achieving a dynamic integration of attribute information into identity embeddings. This prompt-driven method not only reduces false positives across diverse attributes but also establishes a new state-of-the-art for attribute-aware face recognition as validated on several benchmarks.
In conclusion, our research advances the state-of-the-art in attribute-aware face recognition by introducing efficient, unbiased, and dynamic methods for feature extraction, decision-making, and attribute integration. Our contributions promise significant improvements in the robustness and unbiased performance of a face recognition system, setting a new standard for future developments in the field.