Logo image
Open Research University homepage
Surrey researchers Sign in
Sound-VECaps: Improving Audio Generation with Visually Enhanced Captions
Conference paper   Open access

Sound-VECaps: Improving Audio Generation with Visually Enhanced Captions

Yi Yuan, Dongya Jia, Xiaobin Zhuang, Yuanzhe Chen, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xubo Liu, Xiyuan Kang, Mark D. Plumbley, …
ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Institute of Electrical and Electronics Engineers (IEEE)
2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025) (Hyderabad, India, 06/04/2025–11/04/2025)
2025

Abstract

audio retrieval diffusion model audio-language dataset Audio generation
pdf
Yuan et al_b_ICASSP_2025965.84 kBDownloadView
Author's Accepted Manuscript Open Access
url
https://2025.ieeeicassp.org/View
Event WebsiteConference Website

Metrics

1 Record Views

Details

Logo image

Usage Policy