Abstract
Consistency learning using input image, feature, or network perturbations has
shown remarkable results in semi-supervised semantic segmentation, but this
approach can be seriously affected by inaccurate predictions of unlabelled
training images. There are two consequences of these inaccurate predictions: 1)
the training based on the "strict" cross-entropy (CE) loss can easily overfit
prediction mistakes, leading to confirmation bias; and 2) the perturbations
applied to these inaccurate predictions will use potentially erroneous
predictions as training signals, degrading consistency learning. In this paper,
we address the prediction accuracy problem of consistency learning methods with
novel extensions of the mean-teacher (MT) model, which include a new auxiliary
teacher, and the replacement of MT's mean square error (MSE) by a stricter
confidence-weighted cross-entropy (Conf-CE) loss. The accurate prediction by
this model allows us to use a challenging combination of network, input data
and feature perturbations to improve the consistency learning generalisation,
where the feature perturbations consist of a new adversarial perturbation.
Results on public benchmarks show that our approach achieves remarkable
improvements over the previous SOTA methods in the field. Our code is available
at https://github.com/yyliu01/PS-MT.