Abstract
The early detection of glaucoma is essential in preventing visual impairment.
Artificial intelligence (AI) can be used to analyze color fundus photographs
(CFPs) in a cost-effective manner, making glaucoma screening more accessible.
While AI models for glaucoma screening from CFPs have shown promising results
in laboratory settings, their performance decreases significantly in real-world
scenarios due to the presence of out-of-distribution and low-quality images. To
address this issue, we propose the Artificial Intelligence for Robust Glaucoma
Screening (AIROGS) challenge. This challenge includes a large dataset of around
113,000 images from about 60,000 patients and 500 different screening centers,
and encourages the development of algorithms that are robust to ungradable and
unexpected input data. We evaluated solutions from 14 teams in this paper, and
found that the best teams performed similarly to a set of 20 expert
ophthalmologists and optometrists. The highest-scoring team achieved an area
under the receiver operating characteristic curve of 0.99 (95% CI: 0.98-0.99)
for detecting ungradable images on-the-fly. Additionally, many of the
algorithms showed robust performance when tested on three other publicly
available datasets. These results demonstrate the feasibility of robust
AI-enabled glaucoma screening.