Abstract
Deep learning methods have shown outstanding classification accuracy in
medical imaging problems, which is largely attributed to the availability of
large-scale datasets manually annotated with clean labels. However, given the
high cost of such manual annotation, new medical imaging classification
problems may need to rely on machine-generated noisy labels extracted from
radiology reports. Indeed, many Chest X-ray (CXR) classifiers have already been
modelled from datasets with noisy labels, but their training procedure is in
general not robust to noisy-label samples, leading to sub-optimal models.
Furthermore, CXR datasets are mostly multi-label, so current noisy-label
learning methods designed for multi-class problems cannot be easily adapted. In
this paper, we propose a new method designed for the noisy multi-label CXR
learning, which detects and smoothly re-labels samples from the dataset, which
is then used to train common multi-label classifiers. The proposed method
optimises a bag of multi-label descriptors (BoMD) to promote their similarity
with the semantic descriptors produced by BERT models from the multi-label
image annotation. Our experiments on diverse noisy multi-label training sets
and clean testing sets show that our model has state-of-the-art accuracy and
robustness in many CXR multi-label classification benchmarks.