A Performance Evaluation of Several Deep Neural Networks for Reverberant Speech Separation

Qingju Liu; Wenwu Wang; Philip Jackson; Saeid Safavi

doi:10.1109/ACSSC.2018.8645219

Back

A Performance Evaluation of Several Deep Neural Networks for Reverberant Speech Separation

Conference presentation

Open access

A Performance Evaluation of Several Deep Neural Networks for Reverberant Speech Separation

Qingju Liu, Wenwu Wang, Philip Jackson and Saeid Safavi

52nd Asilomar Conference Proceedings, pp.689-693

IEEE

52nd Asilomar Conference on Signals, Systems and Computers (Pacific Grove, CA, 28/10/2018 - 31/10/2018)

2018

DOI: https://doi.org/10.1109/ACSSC.2018.8645219

Abstract

In this paper, we compare different deep neural networks (DNN) in extracting speech signals from competing speakers in room environments, including the conventional fullyconnected multilayer perception (MLP) network, convolutional neural network (CNN), recurrent neural network (RNN), and the recently proposed capsule network (CapsNet). Each DNN takes input of both spectral features and converted spatial features that are robust to position mismatch, and outputs the separation mask for target source estimation. In addition, a psychacoustically-motivated objective function is integrated in each DNN, which explores perceptual importance of each TF unit in the training process. Objective evaluations are performed on the separated sounds using the converged models, in terms of PESQ, SDR as well as STOI. Overall, all the implemented DNNs have greatly improved the quality and speech intelligibility of the embedded target source as compared to the original recordings. In particular, bidirectional RNN, either along the temporal direction or along the frequency bins, outperforms the other DNN structures with consistent improvement.

Files and links (2)

pdf

Asilomar2018447.53 kBDownload View

Text Open Access

url

http://www.asilomarsscconf.org/View

Published (Version of record)

Metrics

221 File views/ downloads

66 Record Views

Details

Title: A Performance Evaluation of Several Deep Neural Networks for Reverberant Speech Separation
Creators: Qingju Liu
Wenwu Wang
Philip Jackson
Saeid Safavi
Publication Details: 52nd Asilomar Conference Proceedings, pp.689-693
Conference: 52nd Asilomar Conference on Signals, Systems and Computers (Pacific Grove, CA, 28/10/2018 - 31/10/2018)
Publisher: IEEE
Date published: 2018
Date submitted: 25/09/2018
Grant note: Funder: EPSRC | Grant ID: EP/L000539/1
Identifiers: 99516179102346
Copyright: © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.”
Academic Unit: School of Computer Science and Electronic Engineering
Resource Type: Conference presentation

A Performance Evaluation of Several Deep Neural Networks for Reverberant Speech Separation

Abstract

Files and links (2)

Metrics

Details

Usage Policy