Abstract
Stereoscopic media have garnered attention due to their widespread applications in 3D visualization, metaverse, virtual reality, and immersive media. By leveraging the disparity between views, stereoscopic media can facilitate 3D structure construction and enhance spatial perception. However, considering the correlation between stereo media pairs, stereoscopic media communication will introduce redundancy. To address this, we propose a novel dual-view deep joint source-channel coding scheme that efficiently extracts and transmits semantic features representing inter-view correlations, thereby reducing redundancy while improving the reconstruction quality. Furthermore, as the transmission channel quality varies, we design an adaptive semantic rate control method for stereoscopic media, based on a self-attention model, which uses inter-view correlations, feature importance, and channel conditions to apply an adaptive mask to control the transmitted data volume. This approach maintains reconstruction quality under varying conditions and improves Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index (SSIM) compared with benchmarks. Finally, leveraging 3D rendering, we demonstrate the practical effectiveness of the proposed schemes for immersive media transmission in real-world scenarios.