Abstract
Despite great strides made on fine-grained visual classification (FGVC),
current methods are still heavily reliant on fully-supervised paradigms where
ample expert labels are called for. Semi-supervised learning (SSL) techniques,
acquiring knowledge from unlabeled data, provide a considerable means forward
and have shown great promise for coarse-grained problems. However, exiting SSL
paradigms mostly assume in-distribution (i.e., category-aligned) unlabeled
data, which hinders their effectiveness when re-proposed on FGVC. In this
paper, we put forward a novel design specifically aimed at making
out-of-distribution data work for semi-supervised FGVC, i.e., to "clue them
in". We work off an important assumption that all fine-grained categories
naturally follow a hierarchical structure (e.g., the phylogenetic tree of
"Aves" that covers all bird species). It follows that, instead of operating on
individual samples, we can instead predict sample relations within this tree
structure as the optimization goal of SSL. Beyond this, we further introduced
two strategies uniquely brought by these tree structures to achieve
inter-sample consistency regularization and reliable pseudo-relation. Our
experimental results reveal that (i) the proposed method yields good robustness
against out-of-distribution data, and (ii) it can be equipped with prior arts,
boosting their performance thus yielding state-of-the-art results. Code is
available at https://github.com/PRIS-CV/RelMatch.