Abstract
Maritime vision systems—whether deployed on unmanned surface vessels, aerial drones,
or coastal surveillance cameras, rely on accurate extraction of three sparse semantic
boundaries—the Visible Horizon Line (VHL), True Horizon Line (THL), and Shoreline
(SL)—to stabilise cameras, geo-register imagery to digital elevation models, and provide
last-resort localisation when GPS is jammed or denied. Yet these contours are challenging
to detect in scenes dominated by vast, low-texture sky-and-water regions, specular
reflections, wave clutter, haze, and rapidly changing weather; fewer than 1% of pixels
belong to the true boundaries, and horizon cues often blur into clouds or disappear
behind vessel wakes. Classical Canny–Hough or super-pixel heuristics collapse under
such conditions, while generic deep networks demand dense, costly annotation. To overcome
these obstacles, this thesis introduces a unified, data-efficient framework—built
from four complementary advances—that achieves state-of-the-art semantic-boundary
detection and multi-boundary estimation for robust, GPS-denied maritime navigation.
The first work builds a framework on top of a streamlined, low-latency UNet-like architecture
that ingests RGB frames alongside inertial measurement unit (IMU) streams,
enabling cross-modal feature alignment. Evaluated on multiple benchmarks, our network
achieves highly accurate predictions for the key semantic boundaries while sustaining
real-time performance. The work shows a first step towards extracting multiple
semantic boundaries from a single image using a simple framework. To the best of our
knowledge, this is the first work to attempt to extract all three semantic boundaries simultaneously
and within real-time constraints. The extraction of multiple boundaries,
or boundaries in general, applies to many autonomous tasks, such as those involving
UAVs, USVs, and land-based vehicles. These domains require the detection of surface
edges and horizons. Enhancing contour detection will improve downstream tasks,
including safer navigation in complex environments, obstacle detection, and collision
avoidance.
Second, recognising that even modest label noise severely degrades thin-boundary prediction,
we introduce a novel lightweight Edge Prior Module (EPM) that guides early
layers to suppress water-reflection artefacts and sky texture. The proposed framework
predicts multiple high-quality semantic boundaries in complex scenarios. The maritime
environment presents numerous obstacles and photometric distortions, making the task
particularly challenging when using classical methods. We train a robust and deployable
model that overcomes numerous challenges, such as varying lighting conditions,
obfuscations, and reflections from water surfaces.
Third, to address the semantically pixel class imbalance and accelerate convergence, we
devise curriculum learning with Fourier Spectral Alignment (FSA). The FSA loss aligns
the predictions and ground truth, forcing the model to capture both low-frequency
global shape and high-frequency edge detail. Our method and mechanisms demonstrate
a significant improvement over the current state of the art in semantic segmentation
on the LaRS dataset. The results showcase the detection capabilities of the approach,
setting a new state-of-the-art.
Fourth, to curb data and annotation costs, we introduce policy-driven curriculum distillation,
where image selection is framed as a Markov decision process solved by a
Deep Q-Network (DQN). At each epoch, the agent evaluates image embeddings, predicted
difficulty, and past validation gains to assign a utility score, admitting only
the most informative samples. The saliency-informed curation compresses the training
set while preserving key performance metric scores compared to full-data training,
reducing annotation effort and GPU compute. Because the policy re-scores incoming
data in real-time, it can continuously adapt the distilled subset to evolving operational
scenarios while retaining the model’s predictive strength.
Collectively, these four contributions—multi-contour semantic segmentation, edge-prior
conditioning, spectral curriculum learning, and reinforcement learning-guided dataset
distillation—compose a principled pipeline for maritime boundary detection. The resulting
system establishes new benchmarks across ReMaSTrED300, MaSTr1325, and
LaRS, achieves robust, real-time performance in GPS-denied conditions, and reduces
data requirements by more than half. Beyond maritime navigation, the proposed
techniques generalise to any sparse-label, edge-centric vision task, offering a scalable
blueprint for boundary-aware perception in autonomous robotic navigation.