Logo image
Open Research University homepage
Surrey researchers Sign in
AudioSetCaps: An Enriched Audio-Caption Dataset Using Automated Generation Pipeline With Large Audio and Language Models
Journal article

AudioSetCaps: An Enriched Audio-Caption Dataset Using Automated Generation Pipeline With Large Audio and Language Models

Jisheng Bai, Haohe Liu, Mou Wang, Dongyuan Shi, Wenwu Wang, Mark D. Plumbley, Woon-Seng Gan and Jianfeng Chen
IEEE Transactions on Audio, Speech and Language Processing, Vol.33, pp.2817-2829
26/06/2025

Abstract

Annotations Audio-language learning audio-language models audio-text retrieval automated audio captioning Electronic mail Instruments Large language models Pipelines Scalability Speech processing Transforms Acoustics Data Mining

Metrics

9 Record Views

Details

Logo image

Usage Policy