Abstract
Recent progresses in Virtual Reality (VR) and Augmented Reality (AR) allow us to experience various VR/AR applications in our daily life. In order to maximise the immersiveness of user in VR/AR environments, a plausible spatial audio reproduction synchronised with visual information is essential. In this paper, we propose a simple and efficient system to estimate room acoustic for plausible reproducton of spatial audio using 360 degrees cameras for VR/AR applications. A pair of 360 degrees images is used for room geometry and acoustic property estimation. A simplified 3D geometric model of the scene is estimated by depth estimation from captured images and semantic labelling using a convolutional neural network (CNN). The real environment acoustics are characterised by frequency -dependent acoustic predictions of the scene. Spatially synchronised audio is reproduced based on the estimated geometric and acoustic properties in the scene. The reconstructed scenes are rendered with synthesised spatial audio as VR/AR content. The results of estimated room geometry and simulated spatial audio are evaluated against the actual measurements and audio calculated from ground -truth Room Impulse Responses (RIRs) recorded in the rooms.
Details about the data underlying this work, along with the terms for data access, are available from: http://dx.doi.org/10.15126/surreydata.00812228