Abstract
In this paper, we propose an iterative deep neural network (DNN)-based binaural source separation scheme, for recovering two concurrent speech signals in a room environment. Besides the commonly-used spectral features, the DNN also takes non-linearly wrapped binaural spatial features as input, which are refined iteratively using parameters estimated from the DNN output via a feedback loop. Different DNN structures have been tested, including a classic multilayer perception regression architecture as well as a new hybrid network with both convolutional and densely-connected layers. Objective evaluations in terms of PESQ and STOI showed consistent improvement over baseline methods using traditional binaural features, especially when the hybrid DNN architecture was employed. In addition, our proposed scheme is robust to mismatches between the training and testing data.