Investigating Passive Filter Pruning for Efficient CNN-Transformer Audio Captioning

Xuenan Xu; Arshdeep Singh; Mengyue Wu; Wenwu Wang; Mark D. Plumbley

doi:10.1109/MLSP58920.2024.10734745

Back

Investigating Passive Filter Pruning for Efficient CNN-Transformer Audio Captioning

Conference paper

Open access

Investigating Passive Filter Pruning for Efficient CNN-Transformer Audio Captioning

Xuenan Xu, Arshdeep Singh, Mengyue Wu, Wenwu Wang and Mark D. Plumbley

2024 IEEE 34th International Workshop on Machine Learning for Signal Processing (MLSP 2024), pp.175-180

Institute of Electrical and Electronics Engineers (IEEE)

34th IEEE International Workshop on Machine Learning for Signal Processing (IEEE MLSP 2024) (London, UK, 22/09/2024–25/09/2024)

01/2025

DOI: https://doi.org/10.1109/MLSP58920.2024.10734745

Abstract

audio captioning

CNNs

Complexity theory

Conferences

Convolution

EfficientNet

Neural networks

Passive filters

pruning filters

Machine Learning

Although automated audio captioning (AAC) has achieved remarkable performance improvement in recent years, concerns about the complexity of AAC models have drawn little attention from the research community. To reduce the number of model parameters, passive filter pruning has been successfully applied to convolution neural networks (CNNs) in audio classification tasks. However, due to the discrepancy between audio classification and AAC, these pruning methods are not necessarily suitable for captioning. In this work, we investigate the effectiveness of several passive filter pruning approaches on an efficient CNN-Transformer-based AAC architecture. Through extensive experiments, we find that under the same pruning ratio, pruning from the later convolution blocks significantly improves the performance. Utilizing the norm-based pruning method, our pruned model reduces the parameter number by 15% compared to that of the original model while maintaining a similar performance.

Files and links (2)

pdf

MLSP2024_audio_cap_pruning742.69 kBDownload View

Author's Accepted Manuscript CC BY V4.0, Open Access

url

https://2024.ieeemlsp.org/View

Event WebsiteConference website

Metrics

2 Record Views

Details

Title: Investigating Passive Filter Pruning for Efficient CNN-Transformer Audio Captioning
Creators: Xuenan Xu (Author) - Shanghai Jiao Tong University
Arshdeep Singh (Author) - University of Surrey, School of Computer Science and Electronic Engineering
Mengyue Wu (Author) - Shanghai Jiao Tong University
Wenwu Wang (Author) - University of Surrey, School of Computer Science and Electronic Engineering
Mark D. Plumbley (Author) - University of Surrey, School of Computer Science and Electronic Engineering
Publication Details: 2024 IEEE 34th International Workshop on Machine Learning for Signal Processing (MLSP 2024), pp.175-180
Conference: 34th IEEE International Workshop on Machine Learning for Signal Processing (IEEE MLSP 2024) (London, UK, 22/09/2024–25/09/2024)
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Number of pages: 6
First online publication date: 04/11/2024
Publication Date: 01/2025
Date accepted for publication: 12/07/2024
Grants: AI for Sound, EP/T019751/1, Engineering and Physical Sciences Research Council (United Kingdom, Swindon) - EPSRC
Grant note: This work was supported in part by Guangxi major science and technology project (No. AA23062062) and the Engineering and Physical Sciences Research Council (EPSRC) under Grant EP/T019751/1 “AI for Sound”.
Identifiers: 991013166502346; WOS:001422233600031
Copyright: © 2024 by the Institute of Electrical and Electronics Engineers, Inc. All rights reserved. For the purpose of open access, the authors have applied a Creative Commons Attribution (CC BY) license to any Author Accepted Manuscript version arising.
Academic Unit: School of Computer Science and Electronic Engineering
Language: English
Resource Type: Conference paper
Data Access Statement: This publication is supported by datasets that are openly available at locations referenced in this paper.

Investigating Passive Filter Pruning for Efficient CNN-Transformer Audio Captioning

Abstract

Files and links (2)

Metrics

Details

Usage Policy