Crossfire Conditional Generative Adversarial Networks for Singing Voice Extraction

Weitao Yuan; Shengbei Wang; Xiangrui Li; Masashi Unoki; Wenwu Wang

doi:10.21437/Interspeech.2021-433

Back

Crossfire Conditional Generative Adversarial Networks for Singing Voice Extraction

Conference proceeding

Open access

Crossfire Conditional Generative Adversarial Networks for Singing Voice Extraction

Weitao Yuan, Shengbei Wang, Xiangrui Li, Masashi Unoki and Wenwu Wang

22nd Annual Conference of the International Speech Communication Association (INTERSPEECH 2021)

22nd Annual Conference of the International Speech Communication Association (INTERSPEECH 2021) (Brno, Czech Republic, 30/08/2021–03/09/2021)

10/2021

DOI: https://doi.org/10.21437/Interspeech.2021-433

Abstract

crossfire crite- rion

generalized projection method

singing voice extraction

Generative Adversarial Networks

Generative adversarial networks (GANs) and Conditional GANs (cGANs) have recently been applied for singing voice extraction (SVE), since they can accurately model the vocal distributions and effectively utilize a large amount of unlabelled datasets. However, current GANs/cGANs based SVE frameworks have no explicit mechanism to eliminate the mutual interferences between different sources. In this work, we introduce a novel 'crossfire' criterion into GANs to complement its standard adversarial training, which forms a dual-objective GANs, namely Crossfire GANs (Cr-GANs). In addition, we design a Generalized Projection Method (GPM) for cGANs based frameworks to extract more effective conditional information for SVE. Using the proposed GPM, we extend our Cr-GANs to conditional version, i.e., Crossfire Conditional GANs (Cr-cGANs). The proposed methods were evaluated on the DSD100 and CCMixter datasets. The numerical results have shown that the 'crossfire' criterion and GPM are beneficial to each other and considerably improve the separation performance of existing GANs/cGANs based SVE methods.

Files and links (3)

pdf

Interspeech2021_Yuan_CameraReady1.18 MBDownload View

Author's Accepted Manuscript Open Access

url

https://www.interspeech2021.org/View

Event WebsiteConference website

url

https://www.isca-archive.org/interspeech_2021/index.htmlView

Conference proceedings webpage

Metrics

176 File views/ downloads

49 Record Views

Details

Title: Crossfire Conditional Generative Adversarial Networks for Singing Voice Extraction
Creators: Weitao Yuan
Shengbei Wang
Xiangrui Li
Masashi Unoki
Wenwu Wang - University of Surrey, School of Computer Science and Electronic Engineering
Publication Details: 22nd Annual Conference of the International Speech Communication Association (INTERSPEECH 2021)
Conference: 22nd Annual Conference of the International Speech Communication Association (INTERSPEECH 2021) (Brno, Czech Republic, 30/08/2021–03/09/2021)
Publisher: International Speech Communication Association (ISCA)
Publication Date: 10/2021
Date accepted for publication: 02/06/2021
Grant note: This work was supported by the National Natural Science Foundation of China (No. 61902280), the Natural Science Foundation of Tianjin (No. 19JCYBJC15600), the Tianjin Science and Technology Project (No. 20YDTPJC00870), the Tianjin Major Project for Civil-Military Integration of Science and Technology (No. 18ZXJMTG00260). It was also supported by a Grant-in-Aid for Scientific Research (B) (No. 17H01761), IO DATA foundation, and the Fund for the Promotion of Joint International Research (Fostering Joint International Research (B)) (20KK0233).
Identifiers: 99608566502346; WOS:000841879503027
Academic Unit: School of Computer Science and Electronic Engineering
Language: English
Resource Type: Conference proceeding

Crossfire Conditional Generative Adversarial Networks for Singing Voice Extraction

Abstract

Files and links (3)

Metrics

Details

Usage Policy