Abstract
The advent of learning with noisy labels (LNL), multi-rater learning, and
human-AI collaboration has revolutionised the development of robust
classifiers, enabling them to address the challenges posed by different types
of data imperfections and complex decision processes commonly encountered in
real-world applications. While each of these methodologies has individually
made significant strides in addressing their unique challenges, the development
of techniques that can simultaneously tackle these three problems remains
underexplored. This paper addresses this research gap by integrating
noisy-label learning, multi-rater learning, and human-AI collaboration with new
benchmarks and the innovative Learning to Complement with Multiple Humans
(LECOMH) approach. LECOMH optimises the level of human collaboration during
testing, aiming to optimise classification accuracy while minimising
collaboration costs that vary from 0 to M, where M is the maximum number of
human collaborators. We quantitatively compare LECOMH with leading human-AI
collaboration methods using our proposed benchmarks. LECOMH consistently
outperforms the competition, with accuracy improving as collaboration costs
increase. Notably, LECOMH is the only method enhancing human labeller
performance across all benchmarks.