Abstract
Most existing unsupervised person re-identification (ReID) methods utilize a transfer learning paradigm that requires independent source annotations. Although recent Unsupervised Domain Adaptation techniques can achieve promising results, they still suffer from the issues of source domain variance and privacy due to the access of raw source data. In this paper, we propose a novel Noise Perception Self-Supervised Learning (NPSSL) paradigm based on the idea that visual tracking would provide useful spatio-temporal localized constraints for improving ReID model learning. Apart from using visual similarity, we fully exploit spatial-temporal motion consistency to assist the person tracklet formulation, complement with visual cues to enhance re-association in crowded scenes. To further alleviate the noise raised by multi-person tracking, we introduce a Noise Perception Self-Paced Learning method to learn from the most confident examples progressively. Specifically, a cluster-level filter and a sample-level filter are devised to jointly exploit the pseudo label during the training process. Extensive experiments on Duke dataset demonstrate the superiority of the NPSSL model over a wide range of unsupervised learning methods and the competitiveness of this paradigm with unsupervised transfer learning methods.