Abstract
Capsule networks are a relatively unexplored type of neural network architecture that preserve spatial information of the input by replacing the pooling layers with convolutional strides and dynamic routing, which allow part-whole relationships of the data to be retained. One disadvantage is the computational complexity of dynamic routing, where each capsule must route to all capsules in a layer. It is common practice to use many capsules with a smaller feature space and there has been little attention in the exploration of using fewer, wider capsules. This reduces the number of routes the network must make, making the network train faster while still accounting for the same learnable space. This paper presents an ablation study on a 3-layer capsule network architecture by changing the primary capsule dimensions to assess the impact on performance and training time. Experiments were performed on capsule networks with capsule sizes: 32×8\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$32\times 8$$\end{document}, 8×32\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$8\times 32$$\end{document}, 16×8\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$16\times 8$$\end{document} and 8×16\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$8\times 16$$\end{document} (number ×\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\times $$\end{document} width), on 11 benchmark datasets: MNIST, CIFAR-10, PCAM, fashionMNIST, BreastMNIST, BloodMNIST, OrganMNIST, PathMNIST and OCTMNIST and SVHN. For all of our datasets we observe capsule network structures that obtain accuracy that exceeds that of the 32×8\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$32\times 8$$\end{document} structures and are least 40% faster to train. Training time reductions ranging from 17% to 75% are reported, alongside accuracy improvements of up to 16%. These results lead us to propose treating the primary capsule dimensions as a hyperparameter to optimise accuracy and resource (e.g., memory, processing) utilisation.