Abstract
This paper compares well-established Convolutional Neural Networks (CNNs) to
recently introduced Vision Transformers for the task of Diabetic Foot Ulcer
Classification, in the context of the DFUC 2021 Grand-Challenge, in which this
work attained the first position. Comprehensive experiments demonstrate that
modern CNNs are still capable of outperforming Transformers in a low-data
regime, likely owing to their ability for better exploiting spatial
correlations. In addition, we empirically demonstrate that the recent
Sharpness-Aware Minimization (SAM) optimization algorithm considerably improves
the generalization capability of both kinds of models. Our results demonstrate
that for this task, the combination of CNNs and the SAM optimization process
results in superior performance than any other of the considered approaches.