Enzo De Sena

Professor, Music and Media, School of Arts, Humanities & Creative Industries, Faculty of Arts, Business and Social Sciences, University of Surrey

Conference presentation Open access

Perceptual Soundfield Reconstruction in Three Dimensions via Sound Field Extrapolation

by Ege Erdem, Enzo De Sena, Huseyin Hacihabiboglu and Zoran Cvetkovic

Availability date 08/08/2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 8023 - 8027

2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019), 12/05/2019–17/05/2019, Brighton, UK

Perceptual sound field reconstruction (PSR) is a spatial audio recording and reproduction method based on the application of stereophonic panning laws in microphone array design. PSR allows rendering a perceptually veridical and stable auditory perspective in the horizontal plane of the listener, and involves recording using near-coincident microphone arrays. This paper extends the PSR concept to three dimensions using sound field extrapolation carried out in the spherical-harmonic domain. Sound field rendering is performed using a two-level loudspeaker rig. An active-intensity-based analysis of the rendered sound field shows that the proposed approach can render direction of monochromatic plane waves accurately.

Conference presentation Peer reviewed

Evaluation of Car Cabin Acoustics Using Auralisation over Headphones

by Jessica Camilleri, Neofytos Kaplanis and Enzo De Sena

Published 03/2019

Proceedings 2019 AES International Conference on Immersive and Interactive Audio

2019 AES International Conference on Immersive and Interactive Audio, 27/03/2019–29/03/2019, York, UK

The auralization schemes in the domain of automotive audio have primarily utilized dummy head recordings in the past. Recently, spatial reproduction allowed the auralization of cabin acoustics over large loudspeaker arrays. Yet no direct comparisons between those methods exist. In this study, the efficacy of headphone presentation is explored in this context. Six acoustical conditions were presented over headphones to experienced assessors (n=23), who were asked to compare them over six elicited perceptual attributes. In 24 out of 36 cases, the results indicate an agreement between headphone- and loudspeaker-based auralisation of identical stimuli sets. It is concluded that, when compared to loudspeakers-based rendering, headphones-based rendering reveals similar judgment on timbral attributes, while certain spatial attributes should be assessed with caution.

Conference presentation Open access Peer reviewed

Joint source localization and dereverberation by sound field interpolation using sparse regularization

by Niccolo Antonello, Enzo De Sena, Marc Moonen, Patrick A. Naylor and Toon van Waterschoot

Published 13/08/2018

Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, 2018, 6892 - 6896

2018 IEEE International Conference on Acoustics, Speech and Signal Processing, 15/04/2018–20/04/2018, Calgary, Alberta, Canada

In this paper, source localization and dereverberation are formulated jointly as an inverse problem. The inverse problem consists in the interpolation of the sound field measured by a set of microphones by matching the recorded sound pressure with that of a particular acoustic model. This model is based on a collection of equivalent sources creating either spherical or plane waves. In order to achieve meaningful results, spatial, spatio-temporal and spatio-spectral sparsity can be promoted in the signals originating from the equivalent sources. The inverse problem consists of a large-scale optimization problem that is solved using a first order matrix-free optimization algorithm. It is shown that once the equivalent source signals capable of effectively interpolating the sound field are obtained, they can be readily used to localize a speech sound source in terms of Direction of Arrival (DOA) and to perform dereverberation in a highly reverberant environment.

Conference presentation Peer reviewed

Improving the perceptual quality of ideal binary masked speech

by L Lightburn, Enzo De Sena, A Moore, PA Naylor and M Brookes

Published 19/06/2017

Proceedings of ICASSP 2017

ICASSP 2017 - 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 05/03/2017–09/03/2017, New Orleans, USA

It is known that applying a time-frequency binary mask to very noisy speech can improve its intelligibility but results in poor perceptual quality. In this paper we propose a new approach to applying a binary mask that combines the intelligibility gains of conventional binary masking with the perceptual quality gains of a classical speech enhancer. The binary mask is not applied directly as a time-frequency gain as in most previous studies. Instead, the mask is used to supply prior information to a classical speech enhancer about the probability of speech presence in different time-frequency regions. Using an oracle ideal binary mask, we show that the proposed method results in a higher predicted quality than other methods of applying a binary mask whilst preserving the improvements in predicted intelligibility.

Enzo De Sena

Professor, Music and Media, School of Arts, Humanities & Creative Industries, Faculty of Arts, Business and Social Sciences, University of Surrey

Output list