Enzo De Sena

Professor, Music and Media, School of Arts, Humanities & Creative Industries, Faculty of Arts, Business and Social Sciences, University of Surrey

Conference proceeding Open access Peer reviewed

White-box Differentiable Model of Perceived Localisation

by Antoine Robert Souchaud, Pedro Lladó, Annika Neidhardt, Zoran Cvetkovic and Enzo De Sena

Published 21/09/2025

2025 IEEE 27th International Workshop on Multimedia Signal Processing (MMSP), In Press

The 27th IEEE International Workshop on Multimedia Signal Processing (MMSP 2025), 21/09/2025–23/09/2025, Beijing, China

Auditory models are useful tools for estimating perceptual attributes of a sound field. Integrating such auditory models in the optimisation of immersive sound systems is a promising strategy when listeners' perception is central to the application. To that end, differentiability is key to allowing the perceptual model to be included in gradient-based optimisation loops. Existing differentiable models, however, are black-box deep-learning based, which limits their interpretability. In this paper, we propose an analytical white-box differentiable model of auditory localisation based on an existing non-differential model. Our evaluations show that the model produces outputs that are highly correlated with the outputs of the non-differential model and data collected in subjective listening tests. The proposed model also enables optimisation of amplitude panning laws in a stereophonic spatial sound field rendering through gradient descent. This study therefore demonstrates, more generally, the feasibility of designing and optimising immersive sound systems using white-box differentiable models of auditory perception.

Conference proceeding Peer reviewed

Differentiable scattering delay networks for artificial reverberation

by Alessandro Ilic Mezza, Riccardo Giampiccolo, Enzo De Sena and Alberto Bernardini

Published 02/09/2025

, 1 - 8

28th International Conference on Digital Audio Effects (DAFx25), 02/09/2025–05/09/2025, Ancona, Italy

Scattering delay networks (SDNs) provide a flexible and efficient framework for artificial reverberation and room acoustic model-ing. In this work, we introduce a differentiable SDN, enabling gradient-based optimization of its parameters to better approximate the acoustics of real-world environments. By formulating key parameters such as scattering matrices and absorption filters as differentiable functions, we employ gradient descent to optimize an SDN based on a target room impulse response. Our approach minimizes discrepancies in perceptually relevant acoustic features, such as energy decay and frequency-dependent reverberation times. Experimental results demonstrate that the learned SDN configurations significantly improve the accuracy of synthetic reverberation , highlighting the potential of data-driven room acoustic modeling.

Conference proceeding Peer reviewed

Past, Present, and Future of Spatial Audio and Room Acoustics

by Shoichi Koyama, Enzo De Sena, Prasanga Samarasinghe, Mark R. P. Thomas and Fabio Antonacci

Availability date 06/08/2025

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 06/04/2025–11/04/2025, Hyderabad, India

The study of spatial audio and room acoustics aims to create immersive audio experiences by modeling the physics and psychoacoustics of how sound behaves in space. In the long history of this research area, various key technologies have been developed based both on theoretical advancements and practical innovations. We highlight historical achievements, initiative activities, recent advancements, and future outlooks in the research area of spatial audio recording and reproduction, and room acoustic simulation, modeling, analysis, and control.

Conference proceeding Open access

Perceptually-driven panning for an extended listening area

by Pedro Lladó, Annika Neidhardt, Antoine Robert Souchaud, Zoran Cvetkovic and Enzo De Sena

Accepted for publication 02/07/2025

2025 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2025)

IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2025, 12/10/2025–15/10/2025, Granlibakken Tahoe, Tahoe City, CA, USA

In loudspeaker-based sound field reproduction, the perceived sound quality deteriorates significantly when listeners move outside of the sweet spot. Although a substantial increase in the number of loudspeakers enables rendering methods that can mitigate this issue, such a solution is not feasible for most real-life applications. This study aims to extend the listening area by finding a panning strategy that optimises an objective function reflecting the localisation and localisation uncertainty over a listening area. To that end we first introduce a psychoacoustic localisation model that outperforms existing models in the context of multichannel loudspeaker setups. Leveraging this model and an existing model of localisation uncertainty, we optimise inter-channel time and level differences for a stereophonic system. The outcome is a new panning approach that depends on the listening area and the most suitable trade-off between localisation and localisation uncertainty.

Conference proceeding Open access Peer reviewed

SUBJECTIVE EVALUATION OF THE FIRST INCOMING REFLECTION - REVISITING AND EXTENDING BARRON'S STUDY

by Annika Neidhardt, Tatiana Surdu, Pedro Llado Gonzalez and Enzo De Sena

Published Summer 2025

Proceedings of Forum Acusticum / Euronoise 2025, the 11th EAA Annual European Conference on Acoustics and Noise Control Engineering, joint with the XLVI Spanish Acoustic Conference TECNIACUSTICA 2025

Forum Acusticum / Euronoise 2025, 23/06/2025–26/06/2025, Màlaga, Spain

In 1971, Barron published a study on The subjective effects of first reflections in concert halls, comprising a lead/ lag paradigm experiment with two loudspeakers set up in an anechoic room. As a result, he presented the determined audibility threshold, as well as a figure showing the audible effects caused by the first reflection (lag) depending on its delay and level relative to the direct sound (lead). This study gave an inspiring first insight into prominent perceptual effects like spatial impression, colouration, image shift, and 'disturbance'. However, the diagram was created based on the responses of only two listeners, evaluating the various attributes of a single item of programme material. To assess the reproducibility and generalisabil-ity of the results, we repeated and extended Barron's experiment with a larger panel of participants and a slightly revised test method. Besides ensemble music, a solo piece played by an electronic bass guitar was considered. The analysis confirmed a signal dependency of the estimated thresholds. Furthermore, despite intense training, mapping the specific attributes to the perceptual effects remained challenging for the complex signals. Considerable individual differences were observed. We present an updated version of Barron's graph as a result of our study.

Conference proceeding Peer reviewed

Evaluation of room simulation approximations for AR Audio considering a flipped loudspeaker scenario

by Annika Neidhardt, Boyd Thwaite, Joshua John Mannall and Enzo De Sena

Accepted for publication 01/06/2025

AES International Conference on Headphone Technology, 27/08/2025–29/08/2025, Espoo, Finland

Spatial Audio for Augmented and Extended Reality (AR/XR) is still limited in its suitability for everyday use due to the perceptual and computational demands of such applications. A number of recent studies have highlighted the potential of simplified room acoustic modelling and its perceptual optimisation for making AR/XR applications more accessible with affordable, mobile hardware. This paper presents a perceptual evaluation of real-time acoustic modelling based on feedback delay networks (FDN), scattering delay networks (SDN), and a hybrid of the image source method (ISM) combined with FDN. The acoustics were modelled using a shoebox-approximation of the original room's geometry, two different approximations of the original loudspeaker directivity and generic HRTFs. We chose a flipped loudspeaker scenario as a critical test case. In the listening experiment, we assessed the Audiovisual Plausibility, Externalisation and Naturalness of the modelled sound fields. The SDN implementation with the subcardioid directivity was perceived similarly external and audiovisually plausible as the measured reference. The hybrid ISM-FDN method and the SDN, both with the directivity of a different loudspeaker, were perceived as natural as the reference. The tested FDN cases exhibit significantly lower ratings than the measurement for the three attributes. The chosen simplifications were not perceptually sufficient in the tested scenario, matching some attributes only. More research on efficient perceptual matching of modelled room acoustics for AR/XR is needed.

Conference proceeding Peer reviewed

RoomAcoustiC++: An open-source room acoustic model for real-time audio simulations

by Joshua John Mannall, Lauri Savioja, Annika Neidhardt, Russell David Mason and Enzo De Sena

Accepted for publication 15/05/2025

2025 AES International Conference on Headphone Technology, 27/08/2025–29/08/2025, Espoo, Finland

An open-source C++ library named RoomAcoustiC++: Real-time Acoustics Library for real-time room acoustic modelling is introduced that implements a hybrid geometric acoustic and feedback delay network model. The geometric acoustic component uses the image edge model to locate geometry-dependent early specular reflections and edge diffraction. The feedback delay network models late reverberation for a target reverberation time. The model is capable of a dynamic simulation including moving geometry, sources and listener and changing wall absorption, with binaural spatialisation over headphones and customisable head-related transfer functions using the 3D Tune-In toolbox. A comparison with existing closed-source and open-source projects is presented. This found that many state-of-the-art room acoustic models for real-time applications are closed-source, limiting reproducibility. RoomAcoustiC++ offers an improved room acoustic model compared to existing open-source projects. The library was validated against physical measurements from the Benchmark for Room Acoustic Simulations (BRAS) database. An analysis of the real-time performance shows that the software is capable of binaural rendering for scenes with occluding geometries and multiple sources.

Conference proceeding Open access Peer reviewed

Immersive Music Production Workflows: An Ethnographic Study of Current Practices

by Marcela Rada, Russell David Mason and Enzo De Sena

Published Spring 2025

158th Audio Engineering Society Convention

Audio Engineering Society Convention, 22/05/2025–24/05/2025, Warsaw, Poland

This study presents an ethnographic analysis of current immersive music production workflows, examining industry trends, tools, and methodologies. Through interviews and participant observations with professionals across various sectors, the research identifies common patterns, effective strategies, and persistent obstacles in immersive audio production. Key findings highlight the ongoing struggle for standardized workflows, the financial and technological barriers faced by independent artists, and the critical role of collaboration between engineers and creatives. Despite the growing adoption of immersive formats, workflows still follow stereo conventions, treating spatialization as an afterthought and complicating the translation of mixes across playback systems. Additionally, the study explores the evolving influence of object-based and bed-based mixing techniques, monitoring inconsistencies across playback systems, and the need for improved accessibility to immersive production education. By synthesizing qualitative insights, this paper contributes to the broader discourse on immersive music production, offering recommendations for future research and industry-wide best practices to ensure the sustainable integration of spatial audio technologies.

Conference proceeding Open access Peer reviewed

Spatial audio models' inventory to cover the attributes from the Spatial Audio Quality Inventory

by Pedro Lladó, Annika Neidhardt, Fabian Brinkmann and Enzo De Sena

Accepted for publication 27/04/2025

Forum Acusticum / Euronoise 2025 - 11th Convention of the European Acoustics Association, 23/06/2025–26/06/2025, Málaga, Spain

The Spatial Audio Quality Inventory (SAQI, Lindau et al. 2014 [1]) defines a comprehensive list of attributes for quality assessment of spatial audio. These attributes are traditionally used in perceptual experiments. However, automatic evaluation is a common alternative to assess spatial audio algorithms by means of acoustic recordings and numerical methods. This study aims at bridging the gap between perceptual evaluation and automatic assessment methods. We performed a focused literature review on available auditory models and proposed a list to cover the attributes in SAQI based on self-imposed selection criteria , such as binaural compatibility. The selected models are publicly available and ready to be used in automatic assessment methods. This Spatial Audio Models' Inventory (SAMI) could serve as relevant metrics to train and/or optimise machine-learning and deep-learning algorithms when the objective is to improve the perceived quality of reproduction in spatial audio applications. Moreover, SAMI composes a benchmark to challenge novel models.

Conference proceeding Open access

A Common-Slopes Late Reverberation Model Based on Acoustic Radiance Transfer

by Matteo Scerbo, Sebastian J Schlecht, Randall Ali, Lauri Savioja and Enzo De Sena

Date presented 05/09/2024

Proceedings of the 27th International Conference on Digital Audio Effects (DAFx24)

Digital Audio Effects, 03/09/2024–06/09/2024, Guildford, UK

In rooms with complex geometry and uneven distribution of energy losses, late reverberation depends on the positions of sound sources and listeners. More precisely, the decay of energy is char-acterised by a sum of exponential curves with position-dependent amplitudes and position-independent decay rates (hence the name common slopes). The amplitude of different energy decay components is a particularly important perceptual aspect that requires efficient modeling in applications such as virtual reality and video games. Acoustic Radiance Transfer (ART) is a room acoustics model focused on late reverberation, which uses a pre-computed acoustic transfer matrix based on the room geometry and materials , and allows interactive changes to source and listener positions. In this work, we present an efficient common-slopes approximation of the ART model. Our technique extracts common slopes from ART using modal decomposition, retaining only the non-oscillating energy modes. Leveraging the structure of ART, changes to the positions of sound sources and listeners only require minimal processing. Experimental results show that even very few slopes are sufficient to capture the positional dependency of late reverberation, reducing model complexity substantially.

Enzo De Sena

Professor, Music and Media, School of Arts, Humanities & Creative Industries, Faculty of Arts, Business and Social Sciences, University of Surrey

Output list