Compressing Quaternion Convolutional Neural Networks for Audio Classification

Arshdeep Singh; Vinayak Abrol; Mark D. Plumbley

doi:10.1109/TASLPRO.2026.3677690

Back

Compressing Quaternion Convolutional Neural Networks for Audio Classification

Journal article

Peer reviewed

Compressing Quaternion Convolutional Neural Networks for Audio Classification

Arshdeep Singh, Vinayak Abrol and Mark D. Plumbley

IEEE Transactions on Audio, Speech and Language Processing, Vol.34, pp.1866-1875

2026

DOI: https://doi.org/10.1109/TASLPRO.2026.3677690

Abstract

audioset

CNNs

Computational complexity

Computational modeling

Convolution

Convolutional neural networks

Emotion recognition

ESC-50

Filters

Mathematical models

Model compression

music genre classification

Quaternions

RAVDESS

Speech recognition

sustainable AI

Conventional Convolutional Neural Networks (CNNs) in the real domain have been widely used for audio classification. However, CNNs have limited ability to capture correlations across channels when processing multi-channel inputs. This can lead to suboptimal feature learning, particularly for complex audio patterns such as multi-channel spectrogram representations. Quaternion Convolutional Neural Networks (QCNNs) address this limitation by employing quaternion algebra to jointly capture inter-channel dependencies, enabling more compact models with fewer learnable parameters while better exploiting the multi-dimensional nature of audio signals. However, QCNNs exhibit higher computational complexity due to the overhead of quaternion operations, resulting in increased inference latency and reduced efficiency compared to conventional CNNs, posing challenges for deployment on resource-constrained platforms. To address this challenge, this study explores knowledge distillation (KD) and pruning, to reduce the computational complexity of QCNNs while maintaining performance. Our experiments on audio classification reveal that pruning QCNNs achieves similar or superior performance compared to KD while requiring less computational effort. Compared to conventional CNNs and Transformer-based architectures, pruned QCNNs achieve competitive performance with a reduced learnable parameter count and computational complexity. On the AudioSet dataset, the pruned QCNN14 achieves performance comparable to that of the conventional CNN14, while using only 10% of the parameters and 70% of the computational complexity of conventional CNN14. Furthermore, pruned QCNNs generalize well across multiple audio classification benchmarks, including GTZAN for music genre recognition, ESC-50 for environmental sound classification and RAVDESS for speech emotion recognition.

Metrics

1 Record Views

Details

Title: Compressing Quaternion Convolutional Neural Networks for Audio Classification
Creators: Arshdeep Singh - King's College London
Vinayak Abrol - Indraprastha Institute of Information Technology Delhi
Mark D. Plumbley - King's College London
Publication Details: IEEE Transactions on Audio, Speech and Language Processing, Vol.34, pp.1866-1875
Publisher: IEEE; PISCATAWAY
Number of pages: 10
Publication Date: 2026
Grant note: EP/T019751/1; EP/Y028805/1 / Engineering and Physical Sciences Research Council (10.13039/501100000266)
Identifiers: 991120675502346; WOS:001735991600003
Academic Unit: School of Computer Science & Electronic Engineering
Language: English
Resource Type: Journal article

Compressing Quaternion Convolutional Neural Networks for Audio Classification

Abstract

Metrics

Details

Usage Policy