Abstract
This paper presents an open and comprehensive framework to systematically
evaluate state-of-the-art contributions to self-supervised monocular depth
estimation. This includes pretraining, backbone, architectural design choices
and loss functions. Many papers in this field claim novelty in either
architecture design or loss formulation. However, simply updating the backbone
of historical systems results in relative improvements of 25%, allowing them to
outperform the majority of existing systems. A systematic evaluation of papers
in this field was not straightforward. The need to compare like-with-like in
previous papers means that longstanding errors in the evaluation protocol are
ubiquitous in the field. It is likely that many papers were not only optimized
for particular datasets, but also for errors in the data and evaluation
criteria. To aid future research in this area, we release a modular codebase
(https://github.com/jspenmar/monodepth_benchmark), allowing for easy evaluation
of alternate design decisions against corrected data and evaluation criteria.
We re-implement, validate and re-evaluate 16 state-of-the-art contributions and
introduce a new dataset (SYNS-Patches) containing dense outdoor depth maps in a
variety of both natural and urban scenes. This allows for the computation of
informative metrics in complex regions such as depth boundaries.