Enzo De Sena

Professor, Music and Media, School of Arts, Humanities & Creative Industries, Faculty of Arts, Business and Social Sciences, University of Surrey

Journal article Open access Peer reviewed

Active acoustic enhancement systems: A review

by William John Cassidy, Gian Marco De Bortoli, Karolina Prawda, Philip Coleman, Russell David Mason, Tapio Lokki, Sebastian J. Schelcht and Enzo De Sena

First online publication 22/04/2026

The Journal of the Acoustical Society of America, 159, 4, 3533 - 3557

Active acoustic enhancement systems (AAESs) use microphones, loudspeakers and electronic processing to modify the reverberation of a space, offering flexible and cost-effective alternatives to passive variable acoustics. These systems can extend the reverberation time of a space and modify perceived characteristics such as wall distance , diffuseness and intimacy. In this article, the current literature is discussed, and common conditions of AAESs are demonstrated using simulations to help researchers to establish a comprehensive understanding of the field. A general model is first defined to approximate any AAES as a linear, time-invariant system of transfer functions. This is used to analyse the general stability condition, which is valuable for system tuning and prediction. The three main topologies of AAESs are presented, namely, in-line, regenerative and hybrid systems, describing the fundamental differences as well as the nuances of commercial implementations with a focus on signal processing techniques. Articles investigating AAESs have been summarised to allow readers to gauge the coverage of experimental research to date. The simulated contribution serves as an exploratory environment to compare AAES conditions, where code and audio examples are available online. Promising future trajectories are identified involving machine learning, artefact perception and expressive performance.

Journal article Open access Peer reviewed

Perceptual effects of modified late reverberation and reverberation time in auditory augmented reality in two rooms

by Christian Schneiderwind, Enzo De Sena and Annika Neidhardt

Accepted for publication 04/02/2026

Acta Acustica, 10, Forthcoming article, 17

This paper presents two experiments investigating perceptual tolerances regarding deviations in the late reverberation of a room in augmented reality (AR) audio rendering. The study is based on binaural room impulse responses (BRIRs) measured with a KEMAR head-and-torso simulator in two seminar rooms with reverberation times (RTs) of about 0.4 s and 1.1 s. We implemented an algorithm to modify the RT while maintaining the spectral profile of the room’s reverberation. In a single stimulus listening test design, participants had to rate externalization, audiovisual plausibility, and room perception for different RT scalings. Differentiating between audiovisual plausibility for source and room helped capture the different perceptual phenomena. In this context, the concept of room acoustic signature preservation has also been proposed. The results indicate that in the reverberant room, RT deviation of 0.1 s already reveal that the acoustics of the room are different. However, plausible illusions in AR can be maintained despite significant perceptible deviations in RT, considering the original early room response. For originally short RT, audiovisual source plausibility is even robust towards larger RT modifications.

Journal article Open access Peer reviewed

Diffraction perception in L-shaped rooms using virtual reality

by Joshua Mannall, Annika Neidhardt, Paul Calamia, Lauri Savioja, Russell Mason and Enzo De Sena

Published 2026

EURASIP Journal on Audio Speech and Music Processing, 2026, 1, 7

Outside of shoebox rooms, acoustic diffraction phenomena are present and can influence important aspects of auditory perception, such as localisation. A simple extension of a shoebox room is an L-shaped room as it introduces a single diffracting edge. This paper presents two experiments carried out in L-shaped rooms in virtual reality. The first investigated whether the inclusion of diffraction modelling influences the perceived plausibility of the acoustic simulation, and the second to what extent newly developed efficient IIR filter diffraction models are equally plausible to the physically accurate Biot-Tolstoy-Medwin-Svensson (BTMS) model. The study compared diffraction of only the direct sound and diffraction of both direct and reflected sound. The results show that the inclusion of diffraction increased the perceived plausibility of the acoustic simulation. A statistically significant increase in plausibility was found by the addition of diffracted reflection paths, but only in the so-called shadow zone. The second experiment determined that the IIR filter diffraction models were similarly plausible to BTMS in 14 of 18 cases with a threshold of 0.5 on a 6-point Likert scale.

Journal article Open access Peer reviewed

Efficient Multichannel Auralization Based on the Modal Decomposition of Acoustic Radiance Transfer (MoD-ART)

by Matteo Scerbo, Sebastian J. Schlecht, Randall Ali, Lauri Savioja and Enzo De Sena

First online publication 05/11/2025

IEEE Transactions on Audio, Speech and Language Processing, 33, Early Access

In complex acoustic environments such as multiple connected rooms, reverberation is highly dependent on the posi tions of sound sources and listeners — not only in terms of early reflections, but late reverberation as well. Modeling this positional dependency accurately is important for immersive, interactive applications such as virtual reality, augmented reality, and video games, where reverberation needs to be adapted in real time as sound sources and listeners move. The recently proposed modal decomposition of acoustic radiance transfer (MoD-ART) method can evaluate position-dependent late reverberation characteristics in real time, based on physical properties of the modeled environment, and it is specifically designed for complex acoustic environments. The reverberation characteristics' auralization (i.e. their application to audio signals) can be accomplished either with convolution or with delay-based reverberators. In this paper, we propose a method to auralize late reverberation efficiently in the presence of multiple sound sources and listeners, based on the MoD-ART model. The proposed method inherits the favorable complexity scaling of MoD-ART's modeling and extends it to the aspect of auralization, enabling the rendering of late reverberation in scenarios with hundreds of interactive sound sources and listeners. Furthermore, the proposed method can model fully dynamic scenarios (meaning both sources and listeners may move) correctly and with no rendering latency.

Conference proceeding Open access Peer reviewed

White-box Differentiable Model of Perceived Localisation

by Antoine Robert Souchaud, Pedro Lladó, Annika Neidhardt, Zoran Cvetkovic and Enzo De Sena

Published 21/09/2025

2025 IEEE 27th International Workshop on Multimedia Signal Processing (MMSP), In Press

The 27th IEEE International Workshop on Multimedia Signal Processing (MMSP 2025), 21/09/2025–23/09/2025, Beijing, China

Auditory models are useful tools for estimating perceptual attributes of a sound field. Integrating such auditory models in the optimisation of immersive sound systems is a promising strategy when listeners' perception is central to the application. To that end, differentiability is key to allowing the perceptual model to be included in gradient-based optimisation loops. Existing differentiable models, however, are black-box deep-learning based, which limits their interpretability. In this paper, we propose an analytical white-box differentiable model of auditory localisation based on an existing non-differential model. Our evaluations show that the model produces outputs that are highly correlated with the outputs of the non-differential model and data collected in subjective listening tests. The proposed model also enables optimisation of amplitude panning laws in a stereophonic spatial sound field rendering through gradient descent. This study therefore demonstrates, more generally, the feasibility of designing and optimising immersive sound systems using white-box differentiable models of auditory perception.

Conference proceeding Peer reviewed

Differentiable scattering delay networks for artificial reverberation

by Alessandro Ilic Mezza, Riccardo Giampiccolo, Enzo De Sena and Alberto Bernardini

Published 02/09/2025

, 1 - 8

28th International Conference on Digital Audio Effects (DAFx25), 02/09/2025–05/09/2025, Ancona, Italy

Scattering delay networks (SDNs) provide a flexible and efficient framework for artificial reverberation and room acoustic model-ing. In this work, we introduce a differentiable SDN, enabling gradient-based optimization of its parameters to better approximate the acoustics of real-world environments. By formulating key parameters such as scattering matrices and absorption filters as differentiable functions, we employ gradient descent to optimize an SDN based on a target room impulse response. Our approach minimizes discrepancies in perceptually relevant acoustic features, such as energy decay and frequency-dependent reverberation times. Experimental results demonstrate that the learned SDN configurations significantly improve the accuracy of synthetic reverberation , highlighting the potential of data-driven room acoustic modeling.

Conference proceeding Peer reviewed

Past, Present, and Future of Spatial Audio and Room Acoustics

by Shoichi Koyama, Enzo De Sena, Prasanga Samarasinghe, Mark R. P. Thomas and Fabio Antonacci

Availability date 06/08/2025

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 06/04/2025–11/04/2025, Hyderabad, India

The study of spatial audio and room acoustics aims to create immersive audio experiences by modeling the physics and psychoacoustics of how sound behaves in space. In the long history of this research area, various key technologies have been developed based both on theoretical advancements and practical innovations. We highlight historical achievements, initiative activities, recent advancements, and future outlooks in the research area of spatial audio recording and reproduction, and room acoustic simulation, modeling, analysis, and control.

Journal article Open access

Modeling nonuniform energy decay through the modal decomposition of acoustic radiance transfer (MoD-ART)

by Matteo Scerbo, Sebastian J. Schlecht, Randall Ali, Lauri Savioja and Enzo De Sena

Published 23/07/2025

IEEE Transactions on Audio, Speech and Language Processing, 33, Early Access, 1 - 13

Modeling late reverberation in real-time interactive applications is a challenging task when multiple sound sources and listeners are present in the same environment. This is especially problematic when the environment is geometrically complex and/or features uneven energy absorption (e.g. coupled volumes), because in such cases the late reverberation is dependent on the sound sources' and listeners' positions, and therefore must be adapted to their movements in real time.We present a novel approach to the task, named modal decomposition of acoustic radiance transfer (MoD-ART), which can handle highly complex scenarios with efficiency. The approach is based on the geometrical acousticsmethod of acoustic radiance transfer, fromwhich we extract a set of energy decaymodes and their positional relationships with sources and listeners. In this paper, we describe the physical and mathematical significance of MoD-ART, highlighting its advantages and applicability to different scenarios. Through an analysis of the method's computational complexity, we show that it compares very favorably with ray-tracing.We also present simulation results showing thatMoD-ART can capture multiple decay slopes and flutter echoes.

Conference proceeding Open access

Perceptually-driven panning for an extended listening area

by Pedro Lladó, Annika Neidhardt, Antoine Robert Souchaud, Zoran Cvetkovic and Enzo De Sena

Accepted for publication 02/07/2025

2025 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2025)

IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2025, 12/10/2025–15/10/2025, Granlibakken Tahoe, Tahoe City, CA, USA

In loudspeaker-based sound field reproduction, the perceived sound quality deteriorates significantly when listeners move outside of the sweet spot. Although a substantial increase in the number of loudspeakers enables rendering methods that can mitigate this issue, such a solution is not feasible for most real-life applications. This study aims to extend the listening area by finding a panning strategy that optimises an objective function reflecting the localisation and localisation uncertainty over a listening area. To that end we first introduce a psychoacoustic localisation model that outperforms existing models in the context of multichannel loudspeaker setups. Leveraging this model and an existing model of localisation uncertainty, we optimise inter-channel time and level differences for a stereophonic system. The outcome is a new panning approach that depends on the listening area and the most suitable trade-off between localisation and localisation uncertainty.

Conference proceeding Open access Peer reviewed

SUBJECTIVE EVALUATION OF THE FIRST INCOMING REFLECTION - REVISITING AND EXTENDING BARRON'S STUDY

by Annika Neidhardt, Tatiana Surdu, Pedro Llado Gonzalez and Enzo De Sena

Published Summer 2025

Proceedings of Forum Acusticum / Euronoise 2025, the 11th EAA Annual European Conference on Acoustics and Noise Control Engineering, joint with the XLVI Spanish Acoustic Conference TECNIACUSTICA 2025

Forum Acusticum / Euronoise 2025, 23/06/2025–26/06/2025, Màlaga, Spain

In 1971, Barron published a study on The subjective effects of first reflections in concert halls, comprising a lead/ lag paradigm experiment with two loudspeakers set up in an anechoic room. As a result, he presented the determined audibility threshold, as well as a figure showing the audible effects caused by the first reflection (lag) depending on its delay and level relative to the direct sound (lead). This study gave an inspiring first insight into prominent perceptual effects like spatial impression, colouration, image shift, and 'disturbance'. However, the diagram was created based on the responses of only two listeners, evaluating the various attributes of a single item of programme material. To assess the reproducibility and generalisabil-ity of the results, we repeated and extended Barron's experiment with a larger panel of participants and a slightly revised test method. Besides ensemble music, a solo piece played by an electronic bass guitar was considered. The analysis confirmed a signal dependency of the estimated thresholds. Furthermore, despite intense training, mapping the specific attributes to the perceptual effects remained challenging for the complex signals. Considerable individual differences were observed. We present an updated version of Barron's graph as a result of our study.

Enzo De Sena

Professor, Music and Media, School of Arts, Humanities & Creative Industries, Faculty of Arts, Business and Social Sciences, University of Surrey

Output list