Spherical Based Human Tracking and 3D Pose Estimation for Immersive Entertainment Production

MATTHEW JAMES SHERE

doi:10.15126/thesis.900146

Human tracking and 3D pose estimation are two core activities of computer vision, identifying and following an individual within a scene in the case of the former, and producing a three-dimensional estimate of an individuals body pose and configuration for the latter. This combination of processes cane applied to scenarios such as entertainment production, home health monitoring or sports analysis, where there is a relatively low person count, and a requirement to know more than just the approximate three-dimensional position of a person. However, such systems generally require non-complimentary camera configurations, requiring two separate but overlapping camera rigs to be established. An ideal solution would be to combine our camera configurations, something which can easily be achieved using wide angle, panoramic or 360° cameras. Through careful placement of these cameras, we simultaneously view the entire scene, and also produce multiple views of an individual in order to inform our pose estimate. However, such cameras bring their own representation problems, hampering the performance of existing solutions, or preventing them from operating entirely. Therefore, we explore this facet of the problem, producing tracking and pose estimation solutions that natively function from 360° imagery. To facilitate this, we firstly contribute a tracker and pose estimation system, operating from a pair of horizontally disjoint 360° cameras. We use provided person segmentation masks to create descriptors suitable for use at differing resolutions, while the specific camera configuration allows us to share these descriptors, using these combined with spatial information to track an individual regardless of their distance from either camera. With a person isolated, we then create a joint-wise pose estimate directly from the spherical coordinate space, eliminating the need for either reprojection operations, or intrinsic calibration information to be provided. Our second contribution reconfigures these cameras to a low, vertical baseline configuration. We simultaneously track each individual in the scene using only two-dimensional joint location estimates, exploiting the camera arrangement to assume an Epipolar relationship. A temporally consistent 3D human pose estimate is then constructed, first as a coarse, Principal Component Analysis (PCA) model, then refined in a joint-wise fashion over successive iterations, smoothing out any unrealistic jumps in motion. Having established tracking in a local area, our final contribution moves beyond the confines of a single room, and tracks individuals as they move throughout a scene comprised of multiple rooms or regions. We perform this with no prior knowledge of the scene layout or content, and use only camera extrinsics and person movements to iteratively build tracks for each individual simultaneously, with each stage informing the next. Overall, we demonstrate that 360° imagery presents many advantages that can be utilised or exploited in both human tracking, and in three-dimensional human pose estimation. We enable tracking in a variety of situations where traditional methods are impractical or impossible, and position methods to provide training data for the next generation of multi-camera, 360° capable deep-learning based tracking approaches. We also produce pose estimates that bridge the gap between multi-view systems and monocular systems.

Spherical Based Human Tracking and 3D Pose Estimation for Immersive Entertainment Production

Abstract

Files and links (1)

Metrics

Details

Spherical Based Human Tracking and 3D Pose Estimation for Immersive Entertainment Production

Abstract

Files and links (1)

Metrics

Details

Usage Policy