Abstract
In this paper, we introduce a novel methodology for characterising the
performance of deep learning networks (ResNets and DenseNet) with respect to
training convergence and generalisation as a function of mini-batch size and
learning rate for image classification. This methodology is based on novel
measurements derived from the eigenvalues of the approximate Fisher information
matrix, which can be efficiently computed even for high capacity deep models.
Our proposed measurements can help practitioners to monitor and control the
training process (by actively tuning the mini-batch size and learning rate) to
allow for good training convergence and generalisation. Furthermore, the
proposed measurements also allow us to show that it is possible to optimise the
training process with a new dynamic sampling training approach that
continuously and automatically change the mini-batch size and learning rate
during the training process. Finally, we show that the proposed dynamic
sampling training approach has a faster training time and a competitive
classification accuracy compared to the current state of the art.