Abstract
Minimally invasive surgery (MIS) has many documented advantages, but the
surgeon's limited visual contact with the scene can be problematic. Hence,
systems that can help surgeons navigate, such as a method that can produce a 3D
semantic map, can compensate for the limitation above. In theory, we can borrow
3D semantic mapping techniques developed for robotics, but this requires
finding solutions to the following challenges in MIS: 1) semantic segmentation,
2) depth estimation, and 3) pose estimation. In this paper, we propose the
first 3D semantic mapping system from knee arthroscopy that solves the three
challenges above. Using out-of-distribution non-human datasets, where pose
could be labeled, we jointly train depth+pose estimators using selfsupervised
and supervised losses. Using an in-distribution human knee dataset, we train a
fully-supervised semantic segmentation system to label arthroscopic image
pixels into femur, ACL, and meniscus. Taking testing images from human knees,
we combine the results from these two systems to automatically create 3D
semantic maps of the human knee. The result of this work opens the pathway to
the generation of intraoperative 3D semantic mapping, registration with
pre-operative data, and robotic-assisted arthroscopy