RFM-Editing: Rectified Flow Matching for Text-guided Audio Editing

Liting Gao; Yi Yuan; Yaru Chen; Yuelan Cheng; Zhenbo Li; Juan  Wen; Shubin  Zhang; Wenwu Wang

Back

Conference proceeding

RFM-Editing: Rectified Flow Matching for Text-guided Audio Editing

Liting Gao, Yi Yuan, Yaru Chen, Yuelan Cheng, Zhenbo Li, Juan Wen, Shubin Zhang and Wenwu Wang

2026 IEEE International Conference on Acoustics, Speech, and Signal Processing

2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (Barcelona, Spain, 04/05/2026–08/05/2026)

17/01/2026

Abstract

Diffusion models have shown remarkable progress in text-toaudio generation. However, text-guided audio editing remains in its early stages. This task focuses on modifying the target content within an audio signal while preserving the rest, thus demanding precise localization and faithful editing according to the text prompt. Existing training-based and zero-shot methods that rely on full-caption or costly optimization often struggle with complex editing or lack practicality. In this work, we propose a novel end-to-end efficient rectified flow matching-based diffusion framework for audio editing, and construct a dataset featuring overlapping multi-event audio to support training and benchmarking in complex scenarios. Experiments show that our model achieves faithful semantic alignment without requiring auxiliary captions or masks, while maintaining competitive editing quality across metrics

Files and links (1)

pdf

gao6.95 MB

Author's Accepted Manuscript CC BY V4.0, Embargoed Access, Embargo ends: 04/05/2026

Metrics

1 Record Views

Details

Title: RFM-Editing: Rectified Flow Matching for Text-guided Audio Editing
Creators: Liting Gao (Author) - University of Surrey, School of Computer Science & Electronic Engineering
Yi Yuan - University of Surrey, School of Computer Science & Electronic Engineering
Yaru Chen - University of Surrey, School of Computer Science & Electronic Engineering
Yuelan Cheng - University of Surrey, School of Computer Science & Electronic Engineering
Zhenbo Li - China Agricultural University
Juan Wen
Shubin Zhang - Ocean University of China
Wenwu Wang - University of Surrey, School of Computer Science & Electronic Engineering
Publication Details: 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing
Conference: 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (Barcelona, Spain, 04/05/2026–08/05/2026)
Publisher: IEEE
Date accepted for publication: 17/01/2026
Identifiers: 991105095302346
Academic Unit: School of Computer Science & Electronic Engineering
Resource Type: Conference proceeding

RFM-Editing: Rectified Flow Matching for Text-guided Audio Editing

Abstract

Files and links (1)

Metrics

Details

Usage Policy