Abstract
In this paper we propose a pipeline for estimating 3D room layout with object and material attribute prediction using a spherical stereo image pair. We assume that the room and objects can be represented as cuboids aligned to the main axes of the room coordinate (Manhattan world). A spherical stereo alignment algorithm is proposed to align two spherical images to the global world coordinate sys- tem. Depth information of the scene is estimated by stereo matching between images. Cubic projection images of the spherical RGB and estimated depth are used for object and material attribute detection. A single Convolutional Neu- ral Network is designed to assign object and attribute la- bels to geometrical elements built from the spherical image. Finally simplified room layout is reconstructed by cuboid fitting. The reconstructed cuboid-based model shows the structure of the scene with object information and material attributes.