Abstract
Most machine learning algorithms assume that training and test data are sampled from the same distribution, which is, however, often violated in practice. This is known as the domain shift problem---as a result, machine learning models can suffer from significant performance drops when deployed in unseen domains. In this thesis, we aim to tackle the domain shift problem in visual recognition under two practical settings: domain generalization (DG) and unsupervised domain adaptation (UDA). We focus on multi-source scenarios, i.e. training data are collected from multiple sources, such as images of different styles or captured under multiple camera views. Three contributions are made in this thesis. First, we investigate domain shift in instance recognition, which has been largely overlooked by existing work. Specifically, we focus on person re-identification (re-ID)---a cross-domain instance matching problem---and propose a novel convolutional neural network (CNN) architecture termed omni-scale network (OSNet) to learn generalizable omni-scale features. Second, we move on to generic image classification problems. To mitigate overfitting in DG due to the absence of target data, we propose to learn a neural network to generate images from novel domains for data augmentation. The learning objective is formulated in such a way that the generated images can be recognized by a label classifier but fool a domain classifier. Finally, we introduce a collaborative ensemble learning approach, which exploits complementarity between different source domains to learn a generalizable ensemble model. Unlabeled target data are handled by pseudo-labeling. This approach can be used for both the DG and UDA settings.