Abstract
16th European Conference Glasgow UK August 23 to 28 2020
Proceedings Part XI We introduce an automatic, end-to-end method for recovering the 3D pose and
shape of dogs from monocular internet images. The large variation in shape
between dog breeds, significant occlusion and low quality of internet images
makes this a challenging problem. We learn a richer prior over shapes than
previous work, which helps regularize parameter estimation. We demonstrate
results on the Stanford Dog dataset, an 'in the wild' dataset of 20,580 dog
images for which we have collected 2D joint and silhouette annotations to split
for training and evaluation. In order to capture the large shape variety of
dogs, we show that the natural variation in the 2D dataset is enough to learn a
detailed 3D prior through expectation maximization (EM). As a by-product of
training, we generate a new parameterized model (including limb scaling) SMBLD
which we release alongside our new annotation dataset StanfordExtra to the
research community.