Abstract
3D hand pose estimation from images has seen
considerable interest from the literature, with new methods
improving overall 3D accuracy. One current challenge is to
address hand-to-hand interaction where self-occlusions and
finger articulation pose a significant problem to estimation.
Little work has applied physical constraints that minimize the
hand intersections that occur as a result of noisy estimation.
This work addresses the intersection of hands by exploiting
an occupancy network that represents the hand’s volume as a
continuous manifold. This allows us to model the probability
distribution of points being inside a hand. We designed an
intersection loss function to minimize the likelihood of
hand-to-point intersections. Moreover, we propose a new hand mesh
parameterization that is superior to the commonly used MANO
model in many respects including lower mesh complexity,
underlying 3D skeleton extraction, watertightness, etc. On the
benchmark INTERHAND2.6M dataset, the models trained using
our intersection loss achieve better results than the state-of-the-art
by significantly decreasing the number of hand intersections
while lowering the mean per-joint positional error. Additionally,
we demonstrate superior performance for 3D hand uplift
on RE:INTERHAND and SMILE datasets and show reduced
hand-to-hand intersections for complex domains such as
sign-language pose estimation.