Abstract
Supervised deep models have achieved the state-of-the-art performance on many vision tasks, relying on the large-scale labeled datasets and advanced algorithms. However, annotating images, especially pixel-wise segmentation masks, is a highly labor-intensive process. The high labor-cost results in the impracticability of labeling sufficient data for every new task. Towards that end, utilizing limited annotations to train a generalized deep model has been topical and drawn much attention in recent years. Consequently, many annotation-efficient learning paradigms have been proposed, such as self-supervised learning, semi-supervised learning, unsupervised learning, few-shot learning, etc. In this thesis, four concrete applications: Unsupervised Domain Adaptation (UDA), Source-Free Domain Adaptive Semantic Segmentation (SFDASS), Few-shot Semantic Segmentation (FSS) and Generalized Few-shot Semantic Segmentation (GFSS) will be explored in order.
The first contribution of this thesis is to propose STochastic clAssifeRs (STAR) for UDA in which given labeled source domain and unlabeled target domain data (huge domain shift exists between two domains) it aims to learn a model working for target domain. This is based on the observation that using more diverse classifiers can better identify misaligned regions, thereby improving the state-of-the-art local alignment based UDA methods. Specifically, instead of representing one classifier as a weight vector, STAR models it as a Gaussian distribution with its variance representing the inter-classifier discrepancy. This enables infinite number of classifiers being used with the same amount of parameters as having two normal classifiers. The evaluations on both image classification and semantic segmentation show its effectiveness.
Secondly, for SFDASS, a more challenging task that assumes a source-domain-trained model and unlabeled target domain data only are available during target adaptation, a novel Bayesian Neural Network (BNN) based uncertainty-aware framework is proposed. The intuition is to use uncertainty as the guidance to downplay the noisy pseudo labels generated by deploying the source model on the domain-shift target domain data. Specifically, with the uncertainty estimation of BNN, two novel self-training based components, i.e., Uncertainty-aware Online Teacher-Student Learning (UOTSL) and Uncertainty-aware FeatureMix (UFM), have been introduced. As shown in the experiments, each component can advance the performance separately, while they together yield a new state of the art.
The third contribution is for FSS, which aims to learn a model for a novel class with a few annotated examples. This can address the limitation of previous methods that need abundant samples from a new class to fine-tune the model. In general, a FSS system has three modules: a CNN encoder, a CNN decoder and a simple classifier, but existing methods meta-learn all three modules given as few as a single support set image. This makes the training intractable due to limited supervision yet more training parameters. To that end, a novel meta-learning pipeline is proposed by focusing solely on the simplest component – classifier. In particular, a Classifier Weight Transformer (CWT) is designed and meta-learned to dynamically adapt the support-set trained classifier’s weights to each query image in an inductive way. Experimental results indicate the efficacy of the method.
Finally, a novel Prediction Calibration Network (PCN) is proposed to address GFSS. GFSS extends FSS to segment base and novel classes simultaneously, which makes it more challenging. Existing methods adopt classifier parameter fusion in which two classifiers are first trained on base and novel classes, respectively, and then fused in parameter level. However, this leads to base class bias as abundant base data dominates the training. To address the base class bias, normalized score fusion is introduced. It works well for base class bias, but induces novel class bias instead. To ensure that the fused scores are not biased to either the base or novel classes, a new Transformer-based calibration module is proposed to enforce feature and score consistency. Extensive experiments have been conducted to show the superiority of the proposed methods on standard benchmarks.