Abstract
This work addresses 3D human pose reconstruction in single images. We present
a method that combines Forward Kinematics (FK) with neural networks to ensure a
fast and valid prediction of 3D pose. Pose is represented as a hierarchical
tree/graph with nodes corresponding to human joints that model their physical
limits. Given a 2D detection of keypoints in the image, we lift the skeleton to
3D using neural networks to predict both the joint rotations and bone lengths.
These predictions are then combined with skeletal constraints using an FK layer
implemented as a network layer in PyTorch. The result is a fast and accurate
approach to the estimation of 3D skeletal pose. Through quantitative and
qualitative evaluation, we demonstrate the method is significantly more
accurate than MediaPipe in terms of both per joint positional error and visual
appearance. Furthermore, we demonstrate generalization over different datasets.
The implementation in PyTorch runs at between 100-200 milliseconds per image
(including CNN detection) using CPU only.