Predicting binaural colouration using VGGish embeddings

Thomas McKenzie; Alec Wright; Daniel Turner; Pedro Lladó

Back

Predicting binaural colouration using VGGish embeddings

Conference proceeding

Peer reviewed

Predicting binaural colouration using VGGish embeddings

Thomas McKenzie, Alec Wright, Daniel Turner and Pedro Lladó

AES International Conference on Machine Learning and Artificial Intelligence for Audio 2025 (London, UK, 08/09/2025–10/09/2025)

04/06/2025

Abstract

colouration

auditory modelling

Machine Learning

An initial feasibility study is presented exploring the use of a pre-trained feature extractor, designed for large-scale audio classification, applied to the task of predicting the colouration between binaural signals. A multilayer perceptron (MLP) is trained to predict binaural colouration using feature embeddings obtained from the VGGish network and data from five previously conducted listening tests. The evaluation compares seven versions of the network, each trained using different data augmentation methods, to three existing signal processing methods: basic spectral difference (BSD), log. spectral distance (LSD) and an auditory model for predicting binaural colouration (PBC-2). Results show that while the MLP networks are comparable to BSD and LSD, further work is needed to compete with the more accurate PBC-2; such as using specific audio features relevant for colouration.

Files and links (2)

pdf

McKenzie2025_AESAIMLA_PredictingBinauralColourationUsingVGGishEmbeddings216.77 kB

Author's Accepted Manuscript Embargoed Access, Embargo ends: 08/09/2025

url

https://aes2.org/events-calendar/2025-aes-international-conference-on-artificial-intelligence-and-machine-learning-for-audio/View

Event WebsiteConference website

Metrics

1 Record Views

Details

Title: Predicting binaural colouration using VGGish embeddings
Creators: Thomas McKenzie (Corresponding Author) - University of Edinburgh
Alec Wright (Author) - University of Edinburgh
Daniel Turner (Author) - University of Edinburgh
Pedro Lladó (Author) - University of Surrey, Department of Music and Media
Conference: AES International Conference on Machine Learning and Artificial Intelligence for Audio 2025 (London, UK, 08/09/2025–10/09/2025)
Publisher: Audio Engineering Society (AES)
Date accepted for publication: 04/06/2025
Grants: Challenges in Immersive Audio Technologies, EP/X032914/1, Engineering and Physical Sciences Research Council (United Kingdom, Swindon) - EPSRC
Grant note: This work was supported by the UK Engineering and Physical Sciences Research Council (EPSRC) grant no. EP/X032914/1, project Challenges in Immersive Audio Technology.
Identifiers: 991018566002346
Academic Unit: Department of Music and Media
Language: English
Resource Type: Conference proceeding

Predicting binaural colouration using VGGish embeddings

Abstract

Files and links (2)

Metrics

Details

Usage Policy