Enzo De Sena

Professor, Music and Media, School of Arts, Humanities & Creative Industries, Faculty of Arts, Business and Social Sciences, University of Surrey

Journal article Open access Peer reviewed

Active acoustic enhancement systems: A review

by William John Cassidy, Gian Marco De Bortoli, Karolina Prawda, Philip Coleman, Russell David Mason, Tapio Lokki, Sebastian J. Schelcht and Enzo De Sena

First online publication 22/04/2026

The Journal of the Acoustical Society of America, 159, 4, 3533 - 3557

Active acoustic enhancement systems (AAESs) use microphones, loudspeakers and electronic processing to modify the reverberation of a space, offering flexible and cost-effective alternatives to passive variable acoustics. These systems can extend the reverberation time of a space and modify perceived characteristics such as wall distance , diffuseness and intimacy. In this article, the current literature is discussed, and common conditions of AAESs are demonstrated using simulations to help researchers to establish a comprehensive understanding of the field. A general model is first defined to approximate any AAES as a linear, time-invariant system of transfer functions. This is used to analyse the general stability condition, which is valuable for system tuning and prediction. The three main topologies of AAESs are presented, namely, in-line, regenerative and hybrid systems, describing the fundamental differences as well as the nuances of commercial implementations with a focus on signal processing techniques. Articles investigating AAESs have been summarised to allow readers to gauge the coverage of experimental research to date. The simulated contribution serves as an exploratory environment to compare AAES conditions, where code and audio examples are available online. Promising future trajectories are identified involving machine learning, artefact perception and expressive performance.

Journal article Open access Peer reviewed

Perceptual effects of modified late reverberation and reverberation time in auditory augmented reality in two rooms

by Christian Schneiderwind, Enzo De Sena and Annika Neidhardt

Accepted for publication 04/02/2026

Acta Acustica, 10, Forthcoming article, 17

This paper presents two experiments investigating perceptual tolerances regarding deviations in the late reverberation of a room in augmented reality (AR) audio rendering. The study is based on binaural room impulse responses (BRIRs) measured with a KEMAR head-and-torso simulator in two seminar rooms with reverberation times (RTs) of about 0.4 s and 1.1 s. We implemented an algorithm to modify the RT while maintaining the spectral profile of the room’s reverberation. In a single stimulus listening test design, participants had to rate externalization, audiovisual plausibility, and room perception for different RT scalings. Differentiating between audiovisual plausibility for source and room helped capture the different perceptual phenomena. In this context, the concept of room acoustic signature preservation has also been proposed. The results indicate that in the reverberant room, RT deviation of 0.1 s already reveal that the acoustics of the room are different. However, plausible illusions in AR can be maintained despite significant perceptible deviations in RT, considering the original early room response. For originally short RT, audiovisual source plausibility is even robust towards larger RT modifications.

Journal article Open access Peer reviewed

Diffraction perception in L-shaped rooms using virtual reality

by Joshua Mannall, Annika Neidhardt, Paul Calamia, Lauri Savioja, Russell Mason and Enzo De Sena

Published 2026

EURASIP Journal on Audio Speech and Music Processing, 2026, 1, 7

Outside of shoebox rooms, acoustic diffraction phenomena are present and can influence important aspects of auditory perception, such as localisation. A simple extension of a shoebox room is an L-shaped room as it introduces a single diffracting edge. This paper presents two experiments carried out in L-shaped rooms in virtual reality. The first investigated whether the inclusion of diffraction modelling influences the perceived plausibility of the acoustic simulation, and the second to what extent newly developed efficient IIR filter diffraction models are equally plausible to the physically accurate Biot-Tolstoy-Medwin-Svensson (BTMS) model. The study compared diffraction of only the direct sound and diffraction of both direct and reflected sound. The results show that the inclusion of diffraction increased the perceived plausibility of the acoustic simulation. A statistically significant increase in plausibility was found by the addition of diffracted reflection paths, but only in the so-called shadow zone. The second experiment determined that the IIR filter diffraction models were similarly plausible to BTMS in 14 of 18 cases with a threshold of 0.5 on a 6-point Likert scale.

Journal article Open access Peer reviewed

Efficient Multichannel Auralization Based on the Modal Decomposition of Acoustic Radiance Transfer (MoD-ART)

by Matteo Scerbo, Sebastian J. Schlecht, Randall Ali, Lauri Savioja and Enzo De Sena

First online publication 05/11/2025

IEEE Transactions on Audio, Speech and Language Processing, 33, Early Access

In complex acoustic environments such as multiple connected rooms, reverberation is highly dependent on the posi tions of sound sources and listeners — not only in terms of early reflections, but late reverberation as well. Modeling this positional dependency accurately is important for immersive, interactive applications such as virtual reality, augmented reality, and video games, where reverberation needs to be adapted in real time as sound sources and listeners move. The recently proposed modal decomposition of acoustic radiance transfer (MoD-ART) method can evaluate position-dependent late reverberation characteristics in real time, based on physical properties of the modeled environment, and it is specifically designed for complex acoustic environments. The reverberation characteristics' auralization (i.e. their application to audio signals) can be accomplished either with convolution or with delay-based reverberators. In this paper, we propose a method to auralize late reverberation efficiently in the presence of multiple sound sources and listeners, based on the MoD-ART model. The proposed method inherits the favorable complexity scaling of MoD-ART's modeling and extends it to the aspect of auralization, enabling the rendering of late reverberation in scenarios with hundreds of interactive sound sources and listeners. Furthermore, the proposed method can model fully dynamic scenarios (meaning both sources and listeners may move) correctly and with no rendering latency.

Journal article Open access

Modeling nonuniform energy decay through the modal decomposition of acoustic radiance transfer (MoD-ART)

by Matteo Scerbo, Sebastian J. Schlecht, Randall Ali, Lauri Savioja and Enzo De Sena

Published 23/07/2025

IEEE Transactions on Audio, Speech and Language Processing, 33, Early Access, 1 - 13

Modeling late reverberation in real-time interactive applications is a challenging task when multiple sound sources and listeners are present in the same environment. This is especially problematic when the environment is geometrically complex and/or features uneven energy absorption (e.g. coupled volumes), because in such cases the late reverberation is dependent on the sound sources' and listeners' positions, and therefore must be adapted to their movements in real time.We present a novel approach to the task, named modal decomposition of acoustic radiance transfer (MoD-ART), which can handle highly complex scenarios with efficiency. The approach is based on the geometrical acousticsmethod of acoustic radiance transfer, fromwhich we extract a set of energy decaymodes and their positional relationships with sources and listeners. In this paper, we describe the physical and mathematical significance of MoD-ART, highlighting its advantages and applicability to different scenarios. Through an analysis of the method's computational complexity, we show that it compares very favorably with ray-tracing.We also present simulation results showing thatMoD-ART can capture multiple decay slopes and flutter echoes.

Journal article Peer reviewed

Scalable-Complexity Steered Response Power based on Low-Rank and Sparse Interpolation

by Thomas Dietzen, Enzo De Sena and Toon van Waterschoot

Published 09/11/2024

IEEE/ACM transactions on audio, speech, and language processing, 32, 1 - 16

The steered response power (SRP) is a popular approach to compute a map of the acoustic scene, typically used for acoustic source localization. The SRP map is obtained as the frequency-weighted output power of a beamformer steered towards a grid of candidate locations. Due to the exhaustive search over a fine grid at all frequency bins, conventional frequency domain-based SRP (conv. FD-SRP) results in a high computational complexity. Time domain-based SRP (conv. TD-SRP) implementations reduce computational complexity at the cost of accuracy using the inverse fast Fourier transform (iFFT). In this paper, to enable a more favourable complexity-performance trade-off as compared to conv. FD-SRP and conv. TD-SRP, we consider the problem of constructing a fine SRP map over the entire search space at scalable computational cost. We propose two approaches to this problem. Expressing the conv. FD-SRP map as a matrix transform of frequency-domain GCCs, we decompose the SRP matrix into a sampling matrix and an interpolation matrix. While sampling can be implemented by the iFFT, we propose to use optimal low-rank or sparse approximations of the interpolation matrix for complexity reduction. The proposed approaches, refered to as sampling + low-rank interpolation-based SRP (SLRI-SRP) and sampling + sparse interpolation-based SRP (SSPI-SRP), are evaluated in various localization scenarios with speech as source signals and compared to the state-of-the-art. The results indicate that SSPI-SRP performs better if large array apertures are used, while SLRI-SRP performs better at small array apertures or a large number of microphones. In comparison to conv. FD-SRP, two to three orders of magnitude of complexity reduction can achieved, often times enabling a more favourable complexity-performance trade-off as compared to conv. TD-SRP. A MATLAB implementation is available online.

Journal article Open access Peer reviewed

Data-driven room acoustic modeling via differentiable feedback delay networks with learnable delay lines

by Alessandro Ilic Mezza, Riccardo Giampiccolo, Enzo De Sena and Alberto Bernardini

Published 08/10/2024

EURASIP journal on audio, speech, and music processing, 2024, 1, 51 - 20

Over the past few decades, extensive research has been devoted to the design of artificial reverberation algorithms aimed at emulating the room acoustics of physical environments. Despite significant advancements, automatic parameter tuning of delay-network models remains an open challenge. We introduce a novel method for finding the parameters of a feedback delay network (FDN) such that its output renders target attributes of a measured room impulse response. The proposed approach involves the implementation of a differentiable FDN with trainable delay lines, which, for the first time, allows us to simultaneously learn each and every delay-network parameter via backpropagation. The iterative optimization process seeks to minimize a perceptually motivated time-domain loss function incorporating differentiable terms accounting for energy decay and echo density. Through experimental validation, we show that the proposed method yields time-invariant frequency-independent FDNs capable of closely matching the desired acoustical characteristics and outperforms existing methods based on genetic algorithms and analytical FDN design.

Journal article Open access Peer reviewed

Room Acoustic Rendering Networks With Control of Scattering and Early Reflections

by Matteo Scerbo, Lauri Savioja and Enzo De Sena

Published 02/08/2024

IEEE/ACM transactions on audio, speech, and language processing, 32, 3745 - 3758

Room acoustic synthesis can be used in virtual reality (VR), augmented reality (AR) and gaming applications to enhance listeners' sense of immersion, realism and externali-sation. A common approach is to use geometrical acoustics (GA) models to compute impulse responses at interactive speed, and fast convolution methods to apply said responses in real time. Alternatively , delay-network-based models are capable of modeling certain aspects of room acoustics, but with a significantly lower computational cost. In order to bridge the gap between these classes of models, recent work introduced delay network designs that approximate Acoustic Radiance Transfer (ART), a GA model that simulates the transfer of acoustic energy between discrete surface patches in an environment. This paper presents two key extensions of such designs. The first extension involves a new physically-based and stability-preserving design of the feedback matrices, enabling more accurate control of scattering and, more in general, of late reverberation properties. The second extension allows an arbitrary number of early reflections to be modeled with high accuracy, meaning the network can be scaled at will between computational cost and early reverberation precision. The proposed extensions are compared to the baseline ART-approximating delay network as well as two reference GA models. The evaluation is based on objective measures of perceptually-relevant features, including frequency-dependent reverberation times, echo density build-up, and early decay time. Results show how the proposed extensions result in a significant improvement over the baseline model, especially for the case of non-convex geometries or the case of unevenly distributed wall absorption, both scenarios of broad practical interest.

Journal article Peer reviewed

Modal Excitation in Feedback Delay Networks

by Sebastian J. Schlecht, Matteo Scerbo, Enzo De Sena and Vesa Valimaki

Published 2024

IEEE signal processing letters, 31, 2690 - 2694

Feedback delay networks (FDNs) are used in audio processing and synthesis. The modal shapes of the system describe the modal excitation by input and output signals. Previously, the Ehrlich-Aberth method was used to find modes in large FDNs. Here, the method is extended to the corresponding eigenvectors indicating the modal shape. In particular, the computational complexity of the proposed analysis method does not depend on the delay-line lengths and is thus suitable for large FDNs, such as artificial reverberators. We show the relation between the compact generalized eigenvectors in the delay state space and the spatially extended modal shapes in the state space. We illustrate this method with an example FDN in which the suggested modal excitation control does not increase the computational cost. The modal shapes can help optimize input and output gains. This letter teaches how selecting the input and output points along the delay lines of an FDN adjusts the spectral shape of the system output.

Journal article Open access Peer reviewed

Efficient diffraction modelling using neural networks and infinite impulse response filters *

by Joshua Mannall, Lauri Savioja, Paul Calamia, Russell David Mason and Enzo De Sena

Published 13/09/2023

Journal of the Audio Engineering Society. [electronic resource], 71, 9, 566 - 576

Creating plausible geometric acoustic simulations in complex scenes requires the inclusion of diffraction modelling. Current real-time diffraction implementations use the Uniform Theory of Diffraction (UTD) which assumes all edges are infinitely long. We utilise recent advances in machine learning to create an efficient infinite impulse response model trained on data generated using the physically accurate Biot-Tolstoy-Medwin model. We propose an approach to data generation that allows our model to be applied to higher-order diffraction. We show that our model is able to approximate the Biot-Tolstoy-Medwin model with a mean absolute level difference of 1.0 dB for 1st-order diffraction while maintaining a higher computational efficiency than the current state of the art using UTD.

Enzo De Sena

Professor, Music and Media, School of Arts, Humanities & Creative Industries, Faculty of Arts, Business and Social Sciences, University of Surrey

Output list