Logo image
RFM-Editing: Rectified Flow Matching for Text-guided Audio Editing
Conference proceeding

RFM-Editing: Rectified Flow Matching for Text-guided Audio Editing

Liting Gao, Yi Yuan, Yaru Chen, Yuelan Cheng, Zhenbo Li, Juan Wen, Shubin Zhang and Wenwu Wang
2026 IEEE International Conference on Acoustics, Speech, and Signal Processing
2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (Barcelona, Spain, 04/05/2026–08/05/2026)
17/01/2026

Abstract

Diffusion models have shown remarkable progress in text-toaudio generation. However, text-guided audio editing remains in its early stages. This task focuses on modifying the target content within an audio signal while preserving the rest, thus demanding precise localization and faithful editing according to the text prompt. Existing training-based and zero-shot methods that rely on full-caption or costly optimization often struggle with complex editing or lack practicality. In this work, we propose a novel end-to-end efficient rectified flow matching-based diffusion framework for audio editing, and construct a dataset featuring overlapping multi-event audio to support training and benchmarking in complex scenarios. Experiments show that our model achieves faithful semantic alignment without requiring auxiliary captions or masks, while maintaining competitive editing quality across metrics
pdf
gao6.95 MB
Author's Accepted Manuscript CC BY V4.0 Embargoed Access, Embargo ends: 04/05/2026

Metrics

1 Record Views

Details

Logo image

Usage Policy