Abstract
The human ability to separate acoustic sources has been termed the cocktail party effect.
Digital technology does not share the same inherent ability. The topic of source separation
was created to cover techniques that enable electronic separation of acoustical sources.
Source separation techniques can be blind in the sense that no additional information is
passed to the separation system regarding room geometry, location of the sources, and
number of sources. Techniques that assume knowledge of one or more of the previous
factors are termed semi-blind.
A multitude of solutions for source separation exist but the situation in which the sources
are moving is still underpopulated. A moving source causes a change in the mixing conditions and the demixing system must be constantly and speedily updated if output quality
is to be maitained. This makes separation of moving sources an even more challenging
topic.
The applications of source separation research are not simply restricted to the acoustical
domain and techniques in this area have previously attributed to advancements in sonar,
radar, medical imaging and financial trend detection.
The purpose of this project was to further the field of acoustic source separation, investigating both static and moving sources. Firstly, robustness of a number of existing techniques has been analysed. This included statistical techniques, array/beamforming based techniques, and time-frequency sparsity based techniques. This yielded a modification to the
more recently proposed Active Intensity Vector (AIV) separation algorithm, which the
author named cardioid correction.
Secondly, an investigation into how Independent Component Analysis (ICA) and AIV
beamforming could be combined was carried out. This resulted in a method for solving the
ICA permutation problem (using AIVs) and also two low complexity post-filters. These
post-filters exploit information from both ICA and AIV beamforming. The result is a
hybrid ICA-AIV separation system.
Thirdly, the relationship between existing source separation performance measures and
subjective listening test results was studied. A source separation system was modelled
to allow controlled variation of the system’s performance. Extensive listening tests were
performed and the relationships with the existing performance measures were presented.
This enables other researchers to convert their source separation system’s performance into
a subjective Mean Opinion Score (MOS).
Fourthly, a moving source data set was created and the aforementioned ICA-AIV system
and cardioid correction algorithms were combined into a single system. Previously, both
these techniques required knowledge of the source locations (azimuth angle with respect
to the array). An AIV based target tracking system was proposed which also incorporates
a speaker classification system. The classification system can train while the sources are
moving (provided there are no large jumps in position for the first 2.5 seconds). Once
trained, the system can track changes in source position even if the sources switch position. The system trains on the separation system’s output and hence the entire system is now operating as a Blind Source Separation (BSS) system.