Abstract
Visual recognition is the task of analyzing and understanding images or videos
for objects, text, and other subjects. A remarkable success for visual recognition is
achieved by deep convolutional neural networks (CNNs). However, when the CNNs
are deployed in a new target environment, where the test data follow a different
distribution from the training data, their performance often drops significantly. A
popular solution to such distribution mismatch (also known as domain gap) is
unsupervised domain adaptation (UDA).
UDA aims to transfer the knowledge learned from one or multiple labeled source
domains to a target domain where only unlabeled data are given for model adaptation. Most UDA studies address the domain gap by aligning feature distributions of
different domains. However, when the number of source domains increases, aligning the target distribution to all the source domains can even harm discriminative
feature learning, thus counter-productive.
To tackle UDA, we propose four novel methods from different perspectives:
loss function, network architecture, data/feature augmentation and optimization/training strategy. (1) We propose a domain attention consistency loss to align the distributions of channel-wise attention weights in each pair of source-target domains for learning transferable latent attributes. (2) We propose a dynamic neural
network, of which convolutional kernels are conditioned on each input instance,
to adapt domain-invariant deep features to each individual instance. (3) We propose a distribution-based class-wise feature augmentation to generate intermediate
features between different domains to bridge the domain gap, and meanwhile,
leverage such feature augmentation to downplay noisy target pseudo-labels for
noise-robust training. (4) We propose a two-step training strategy to overcome
the source-domain-bias in a UDA model, where the second step adopts a label
noise-robust strategy to fine-tune such a UDA model on the pseudo-labeled target
domain data only. Extensive experiments demonstrate the effectiveness of these
four methods, leading to state-of-the-art performance.