Output list
Journal article
Active acoustic enhancement systems: A review
First online publication 22/04/2026
The Journal of the Acoustical Society of America, 159, 4, 3533 - 3557
Active acoustic enhancement systems (AAESs) use microphones, loudspeakers and electronic processing to modify the reverberation of a space, offering flexible and cost-effective alternatives to passive variable acoustics. These systems can extend the reverberation time of a space and modify perceived characteristics such as wall distance , diffuseness and intimacy. In this article, the current literature is discussed, and common conditions of AAESs are demonstrated using simulations to help researchers to establish a comprehensive understanding of the field. A general model is first defined to approximate any AAES as a linear, time-invariant system of transfer functions. This is used to analyse the general stability condition, which is valuable for system tuning and prediction. The three main topologies of AAESs are presented, namely, in-line, regenerative and hybrid systems, describing the fundamental differences as well as the nuances of commercial implementations with a focus on signal processing techniques. Articles investigating AAESs have been summarised to allow readers to gauge the coverage of experimental research to date. The simulated contribution serves as an exploratory environment to compare AAES conditions, where code and audio examples are available online. Promising future trajectories are identified involving machine learning, artefact perception and expressive performance.
Journal article
Accepted for publication 04/02/2026
Acta Acustica, 10, Forthcoming article, 17
This paper presents two experiments investigating perceptual tolerances regarding deviations in the late reverberation of a room in augmented reality (AR) audio rendering. The study is based on binaural room impulse responses (BRIRs) measured with a KEMAR head-and-torso simulator in two seminar rooms with reverberation times (RTs) of about 0.4 s and 1.1 s. We implemented an algorithm to modify the RT while maintaining the spectral profile of the room’s reverberation. In a single stimulus listening test design, participants had to rate externalization, audiovisual plausibility, and room perception for different RT scalings. Differentiating between audiovisual plausibility for source and room helped capture the different perceptual phenomena. In this context, the concept of room acoustic signature preservation has also been proposed. The results indicate that in the reverberant room, RT deviation of 0.1 s already reveal that the acoustics of the room are different. However, plausible illusions in AR can be maintained despite significant perceptible deviations in RT, considering the original early room response. For originally short RT, audiovisual source plausibility is even robust towards larger RT modifications.
Journal article
Diffraction perception in L-shaped rooms using virtual reality
Published 2026
EURASIP Journal on Audio Speech and Music Processing, 2026, 1, 7
Outside of shoebox rooms, acoustic diffraction phenomena are present and can influence important aspects of auditory perception, such as localisation. A simple extension of a shoebox room is an L-shaped room as it introduces a single diffracting edge. This paper presents two experiments carried out in L-shaped rooms in virtual reality. The first investigated whether the inclusion of diffraction modelling influences the perceived plausibility of the acoustic simulation, and the second to what extent newly developed efficient IIR filter diffraction models are equally plausible to the physically accurate Biot-Tolstoy-Medwin-Svensson (BTMS) model. The study compared diffraction of only the direct sound and diffraction of both direct and reflected sound. The results show that the inclusion of diffraction increased the perceived plausibility of the acoustic simulation. A statistically significant increase in plausibility was found by the addition of diffracted reflection paths, but only in the so-called shadow zone. The second experiment determined that the IIR filter diffraction models were similarly plausible to BTMS in 14 of 18 cases with a threshold of 0.5 on a 6-point Likert scale.
Journal article
First online publication 05/11/2025
IEEE Transactions on Audio, Speech and Language Processing, 33, Early Access
In complex acoustic environments such as multiple connected rooms, reverberation is highly dependent on the posi tions of sound sources and listeners — not only in terms of early reflections, but late reverberation as well. Modeling this positional dependency accurately is important for immersive, interactive applications such as virtual reality, augmented reality, and video games, where reverberation needs to be adapted in real time as sound sources and listeners move. The recently proposed modal decomposition of acoustic radiance transfer (MoD-ART) method can evaluate position-dependent late reverberation characteristics in real time, based on physical properties of the modeled environment, and it is specifically designed for complex acoustic environments. The reverberation characteristics' auralization (i.e. their application to audio signals) can be accomplished either with convolution or with delay-based reverberators. In this paper, we propose a method to auralize late reverberation efficiently in the presence of multiple sound sources and listeners, based on the MoD-ART model. The proposed method inherits the favorable complexity scaling of MoD-ART's modeling and extends it to the aspect of auralization, enabling the rendering of late reverberation in scenarios with hundreds of interactive sound sources and listeners. Furthermore, the proposed method can model fully dynamic scenarios (meaning both sources and listeners may move) correctly and with no rendering latency.
Journal article
Published 23/07/2025
IEEE Transactions on Audio, Speech and Language Processing, 33, Early Access, 1 - 13
Modeling late reverberation in real-time interactive applications is a challenging task when multiple sound sources and listeners are present in the same environment. This is especially problematic when the environment is geometrically complex and/or features uneven energy absorption (e.g. coupled volumes), because in such cases the late reverberation is dependent on the sound sources' and listeners' positions, and therefore must be adapted to their movements in real time.We present a novel approach to the task, named modal decomposition of acoustic radiance transfer (MoD-ART), which can handle highly complex scenarios with efficiency. The approach is based on the geometrical acousticsmethod of acoustic radiance transfer, fromwhich we extract a set of energy decaymodes and their positional relationships with sources and listeners. In this paper, we describe the physical and mathematical significance of MoD-ART, highlighting its advantages and applicability to different scenarios. Through an analysis of the method's computational complexity, we show that it compares very favorably with ray-tracing.We also present simulation results showing thatMoD-ART can capture multiple decay slopes and flutter echoes.
Journal article
Scalable-Complexity Steered Response Power based on Low-Rank and Sparse Interpolation
Published 09/11/2024
IEEE/ACM transactions on audio, speech, and language processing, 32, 1 - 16
The steered response power (SRP) is a popular approach to compute a map of the acoustic scene, typically used for acoustic source localization. The SRP map is obtained as the frequency-weighted output power of a beamformer steered towards a grid of candidate locations. Due to the exhaustive search over a fine grid at all frequency bins, conventional frequency domain-based SRP (conv. FD-SRP) results in a high computational complexity. Time domain-based SRP (conv. TD-SRP) implementations reduce computational complexity at the cost of accuracy using the inverse fast Fourier transform (iFFT). In this paper, to enable a more favourable complexity-performance trade-off as compared to conv. FD-SRP and conv. TD-SRP, we consider the problem of constructing a fine SRP map over the entire search space at scalable computational cost. We propose two approaches to this problem. Expressing the conv. FD-SRP map as a matrix transform of frequency-domain GCCs, we decompose the SRP matrix into a sampling matrix and an interpolation matrix. While sampling can be implemented by the iFFT, we propose to use optimal low-rank or sparse approximations of the interpolation matrix for complexity reduction. The proposed approaches, refered to as sampling + low-rank interpolation-based SRP (SLRI-SRP) and sampling + sparse interpolation-based SRP (SSPI-SRP), are evaluated in various localization scenarios with speech as source signals and compared to the state-of-the-art. The results indicate that SSPI-SRP performs better if large array apertures are used, while SLRI-SRP performs better at small array apertures or a large number of microphones. In comparison to conv. FD-SRP, two to three orders of magnitude of complexity reduction can achieved, often times enabling a more favourable complexity-performance trade-off as compared to conv. TD-SRP. A MATLAB implementation is available online.
Journal article
Published 08/10/2024
EURASIP journal on audio, speech, and music processing, 2024, 1, 51 - 20
Over the past few decades, extensive research has been devoted to the design of artificial reverberation algorithms aimed at emulating the room acoustics of physical environments. Despite significant advancements, automatic parameter tuning of delay-network models remains an open challenge. We introduce a novel method for finding the parameters of a feedback delay network (FDN) such that its output renders target attributes of a measured room impulse response. The proposed approach involves the implementation of a differentiable FDN with trainable delay lines, which, for the first time, allows us to simultaneously learn each and every delay-network parameter via backpropagation. The iterative optimization process seeks to minimize a perceptually motivated time-domain loss function incorporating differentiable terms accounting for energy decay and echo density. Through experimental validation, we show that the proposed method yields time-invariant frequency-independent FDNs capable of closely matching the desired acoustical characteristics and outperforms existing methods based on genetic algorithms and analytical FDN design.
Journal article
Room Acoustic Rendering Networks With Control of Scattering and Early Reflections
Published 02/08/2024
IEEE/ACM transactions on audio, speech, and language processing, 32, 3745 - 3758
Room acoustic synthesis can be used in virtual reality (VR), augmented reality (AR) and gaming applications to enhance listeners' sense of immersion, realism and externali-sation. A common approach is to use geometrical acoustics (GA) models to compute impulse responses at interactive speed, and fast convolution methods to apply said responses in real time. Alternatively , delay-network-based models are capable of modeling certain aspects of room acoustics, but with a significantly lower computational cost. In order to bridge the gap between these classes of models, recent work introduced delay network designs that approximate Acoustic Radiance Transfer (ART), a GA model that simulates the transfer of acoustic energy between discrete surface patches in an environment. This paper presents two key extensions of such designs. The first extension involves a new physically-based and stability-preserving design of the feedback matrices, enabling more accurate control of scattering and, more in general, of late reverberation properties. The second extension allows an arbitrary number of early reflections to be modeled with high accuracy, meaning the network can be scaled at will between computational cost and early reverberation precision. The proposed extensions are compared to the baseline ART-approximating delay network as well as two reference GA models. The evaluation is based on objective measures of perceptually-relevant features, including frequency-dependent reverberation times, echo density build-up, and early decay time. Results show how the proposed extensions result in a significant improvement over the baseline model, especially for the case of non-convex geometries or the case of unevenly distributed wall absorption, both scenarios of broad practical interest.
Journal article
Modal Excitation in Feedback Delay Networks
Published 2024
IEEE signal processing letters, 31, 2690 - 2694
Feedback delay networks (FDNs) are used in audio processing and synthesis. The modal shapes of the system describe the modal excitation by input and output signals. Previously, the Ehrlich-Aberth method was used to find modes in large FDNs. Here, the method is extended to the corresponding eigenvectors indicating the modal shape. In particular, the computational complexity of the proposed analysis method does not depend on the delay-line lengths and is thus suitable for large FDNs, such as artificial reverberators. We show the relation between the compact generalized eigenvectors in the delay state space and the spatially extended modal shapes in the state space. We illustrate this method with an example FDN in which the suggested modal excitation control does not increase the computational cost. The modal shapes can help optimize input and output gains. This letter teaches how selecting the input and output points along the delay lines of an FDN adjusts the spectral shape of the system output.
Journal article
Efficient diffraction modelling using neural networks and infinite impulse response filters *
Published 13/09/2023
Journal of the Audio Engineering Society. [electronic resource], 71, 9, 566 - 576
Creating plausible geometric acoustic simulations in complex scenes requires the inclusion of diffraction modelling. Current real-time diffraction implementations use the Uniform Theory of Diffraction (UTD) which assumes all edges are infinitely long. We utilise recent advances in machine learning to create an efficient infinite impulse response model trained on data generated using the physically accurate Biot-Tolstoy-Medwin model. We propose an approach to data generation that allows our model to be applied to higher-order diffraction. We show that our model is able to approximate the Biot-Tolstoy-Medwin model with a mean absolute level difference of 1.0 dB for 1st-order diffraction while maintaining a higher computational efficiency than the current state of the art using UTD.