Abstract
Generalised zero-shot learning (GZSL) is a classification problem where the
learning stage relies on a set of seen visual classes and the inference stage
aims to identify both the seen visual classes and a new set of unseen visual
classes. Critically, both the learning and inference stages can leverage a
semantic representation that is available for the seen and unseen classes. Most
state-of-the-art GZSL approaches rely on a mapping between latent visual and
semantic spaces without considering if a particular sample belongs to the set
of seen or unseen classes. In this paper, we propose a novel GZSL method that
learns a joint latent representation that combines both visual and semantic
information. This mitigates the need for learning a mapping between the two
spaces. Our method also introduces a domain classification that estimates
whether a sample belongs to a seen or an unseen class. Our classifier then
combines a class discriminator with this domain classifier with the goal of
reducing the natural bias that GZSL approaches have toward the seen classes.
Experiments show that our method achieves state-of-the-art results in terms of
harmonic mean, the area under the seen and unseen curve and unseen
classification accuracy on public GZSL benchmark data sets. Our code will be
available upon acceptance of this paper.