Abstract
In the maritime environment, navigation and localisation are primarily driven by systems such as GPS. However, in a scenario where GPS is not available, e.g., it is jammed or the satellite connection was lost, navigators can use visual methods derived from surrounding land masses and other permanent features in the perceptual range. To enable autonomous navigation, specifically localisation, a vessel must determine its position by extracting and matching the contours of its surrounding environment with an elevation model. The contours of interest are the true horizon line, visible horizon line and shoreline. Extracting these contours is commonly approached using computational methods such as edge detection or pixel clustering techniques that are not robust and build on weak priors. To this end, we propose the first learning-based framework that explores the fusion of inertial data into an encoder-decoder model and extracts the contours. In addition, extensive data augmentation methods are used to extend the MaSTr1325 dataset, introducing further robustness to the common environmental challenges faced by the sensors of unmanned surface vessels. We form a small curated dataset containing 300 images - composed of six component segmentation masks and three further masks describing the true horizon and visible horizon contours and the shoreline, for evaluation. We experimented extensively with popular segmentation models such as UNet, SegNet, DeepLabV3+ and TransUNet with various backbones for a quantitative comparison. The results show that, within a small margin of ten pixels and in a high-resolution image, our system detects three key contours, namely shorelines, true and visible horizon contours, used in navigation with an accuracy of 63.79%, 68.94% and 89.75%, respectively.