Physics-Aware Novel-View Acoustic Synthesis with Vision-Language Priors and 3D Acoustic Environment Modeling

Wenwu Wang; Congyi Fan; Jian Guan; Youtian Lin; Dongli Xu; Tong Ye; Qiaoxi Zhu; Pengming Feng

doi:10.48550/arXiv.2601.19712

Back

Conference paper

Physics-Aware Novel-View Acoustic Synthesis with Vision-Language Priors and 3D Acoustic Environment Modeling

Wenwu Wang, Congyi Fan, Jian Guan, Youtian Lin, Dongli Xu, Tong Ye, Qiaoxi Zhu and Pengming Feng

Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2026)

Institute of Electrical and Electronics Engineers (IEEE)

The IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2026 (Barcelona, Spain, 04/05/2026–08/05/2026)

17/01/2026

DOI: https://doi.org/10.48550/arXiv.2601.19712

Abstract

physics- aware feature representation

vision-language priors

Novel-view acoustic synthesis

Spatial audio is essential for immersive experiences, yet novel-view acoustic synthesis (NVAS) remains challenging due to complex physical phenomena such as reflection , diffraction, and material absorption. Existing methods based on single-view or panoramic inputs improve spatial fidelity but fail to capture global geometry and semantic cues such as object layout and material properties. To address this, we propose Phys-NVAS, the first physics-aware NVAS framework that integrates spatial geometry modeling with vision–language semantic priors. A global 3D acoustic environment is reconstructed from multi-view images and depth maps to estimate room size and shape, enhancing spatial awareness of sound propagation. Meanwhile, a vision–language model extracts physics-aware priors of objects, layouts, and materials, capturing absorption and reflection beyond geometry. An acoustic feature fusion adapter unifies these cues into a physics-aware representation for bin-aural generation. Experiments on RWAVS demonstrate that Phys-NVAS yields binaural audio with improved realism and physical consistency.

Files and links (2)

pdf

AV_3D_audio_CR2.26 MB

Author's Accepted Manuscript Restricted. Access maybe granted on request., This file will be open access upon publication. CC BY V4.0

url

https://2026.ieeeicassp.org/View

Event Website Conference website

Metrics

3 Record Views

Details

Title: Physics-Aware Novel-View Acoustic Synthesis with Vision-Language Priors and 3D Acoustic Environment Modeling
Creators: Wenwu Wang (Author) - University of Surrey, School of Computer Science & Electronic Engineering
Congyi Fan (Author) - Group of Intelligent Signal Processing, College of Computer Science and Technology, Harbin Engineering University, China
Jian Guan (Author) - Group of Intelligent Signal Processing, College of Computer Science and Technology, Harbin Engineering University, China
Youtian Lin (Author) - Nanjing University
Dongli Xu (Author) - KU Leuven
Tong Ye (Author) - Group of Intelligent Signal Processing, College of Computer Science and Technology, Harbin Engineering University, China
Qiaoxi Zhu (Author) - University of Technology Sydney
Pengming Feng (Author) - State Key Laboratory of Space Information System and Integrated Application (China, Beijing)
Publication Details: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2026)
Conference: The IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2026 (Barcelona, Spain, 04/05/2026–08/05/2026)
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date accepted for publication: 17/01/2026
Grant note: This work was supported by the Fundamental Research Funds for the Central Universities (Grant No. 3072025YY0601).
Identifiers: 991110395202346
Copyright: © Copyright 2026 IEEE – All rights reserved. For the purpose of open access, the authors have applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising.
Academic Unit: School of Computer Science & Electronic Engineering
Language: English
Resource Type: Conference paper

Physics-Aware Novel-View Acoustic Synthesis with Vision-Language Priors and 3D Acoustic Environment Modeling

Abstract

Files and links (2)

Metrics

Details

Usage Policy