Abstract
Invariance learning algorithms that conditionally filter out domain-specific
random variables as distractors, do so based only on the data semantics, and
not the target domain under evaluation. We show that a provably optimal and
sample-efficient way of learning conditional invariances is by relaxing the
invariance criterion to be non-commutatively directed towards the target
domain. Under domain asymmetry, i.e., when the target domain contains
semantically relevant information absent in the source, the risk of the encoder
$\varphi^*$ that is optimal on average across domains is strictly lower-bounded
by the risk of the target-specific optimal encoder $\Phi^*_\tau$. We prove that
non-commutativity steers the optimization towards $\Phi^*_\tau$ instead of
$\varphi^*$, bringing the $\mathcal{H}$-divergence between domains down to
zero, leading to a stricter bound on the target risk. Both our theory and
experiments demonstrate that non-commutative invariance (NCI) can leverage
source domain samples to meet the sample complexity needs of learning
$\Phi^*_\tau$, surpassing SOTA invariance learning algorithms for domain
adaptation, at times by over $2\%$, approaching the performance of an oracle.
Implementation is available at https://github.com/abhrac/nci.