Abstract
We present a generalised self-supervised learning approach for monocular
estimation of the real depth across scenes with diverse depth ranges from
1--100s of meters. Existing supervised methods for monocular depth estimation
require accurate depth measurements for training. This limitation has led to
the introduction of self-supervised methods that are trained on stereo image
pairs with a fixed camera baseline to estimate disparity which is transformed
to depth given known calibration. Self-supervised approaches have demonstrated
impressive results but do not generalise to scenes with different depth ranges
or camera baselines. In this paper, we introduce RealMonoDepth a
self-supervised monocular depth estimation approach which learns to estimate
the real scene depth for a diverse range of indoor and outdoor scenes. A novel
loss function with respect to the true scene depth based on relative depth
scaling and warping is proposed. This allows self-supervised training of a
single network with multiple data sets for scenes with diverse depth ranges
from both stereo pair and in the wild moving camera data sets. A comprehensive
performance evaluation across five benchmark data sets demonstrates that
RealMonoDepth provides a single trained network which generalises depth
estimation across indoor and outdoor scenes, consistently outperforming
previous self-supervised approaches.