Logo image
Compressing Quaternion Convolutional Neural Networks for Audio Classification
Journal article   Peer reviewed

Compressing Quaternion Convolutional Neural Networks for Audio Classification

Arshdeep Singh, Vinayak Abrol and Mark D. Plumbley
IEEE Transactions on Audio, Speech and Language Processing, Vol.34, pp.1866-1875
2026

Abstract

audioset CNNs Computational complexity Computational modeling Convolution Convolutional neural networks Emotion recognition ESC-50 Filters Mathematical models Model compression music genre classification Quaternions RAVDESS Speech recognition sustainable AI
Conventional Convolutional Neural Networks (CNNs) in the real domain have been widely used for audio classification. However, CNNs have limited ability to capture correlations across channels when processing multi-channel inputs. This can lead to suboptimal feature learning, particularly for complex audio patterns such as multi-channel spectrogram representations. Quaternion Convolutional Neural Networks (QCNNs) address this limitation by employing quaternion algebra to jointly capture inter-channel dependencies, enabling more compact models with fewer learnable parameters while better exploiting the multi-dimensional nature of audio signals. However, QCNNs exhibit higher computational complexity due to the overhead of quaternion operations, resulting in increased inference latency and reduced efficiency compared to conventional CNNs, posing challenges for deployment on resource-constrained platforms. To address this challenge, this study explores knowledge distillation (KD) and pruning, to reduce the computational complexity of QCNNs while maintaining performance. Our experiments on audio classification reveal that pruning QCNNs achieves similar or superior performance compared to KD while requiring less computational effort. Compared to conventional CNNs and Transformer-based architectures, pruned QCNNs achieve competitive performance with a reduced learnable parameter count and computational complexity. On the AudioSet dataset, the pruned QCNN14 achieves performance comparable to that of the conventional CNN14, while using only 10% of the parameters and 70% of the computational complexity of conventional CNN14. Furthermore, pruned QCNNs generalize well across multiple audio classification benchmarks, including GTZAN for music genre recognition, ESC-50 for environmental sound classification and RAVDESS for speech emotion recognition.

Metrics

1 Record Views

Details

Logo image

Usage Policy