Abstract
The future of space exploration relies on developing novel systems for autonomous operations and
onboard data handling. The use of deep learning can provide techniques and models capable of solving
both existing and foreseen challenges identified in autonomous space applications. The existing
research on deep learning has seen a strong focus on supervised learning performance in terms of
accuracy, but current trends have shifted the attention towards speed and efficiency to meet the
requirements of embedded systems. This meant a need for conceptual changes to the existing techniques
and frameworks. In this context of computational efficiency, two recent branches of work can be
distinguished. One tries to eliminate redundancy in data and computation through compression and
optimization techniques. The other line exploits the fact that not all tasks require the same amount of
computation and explores selective and task-dependent execution of deep neural networks—however,
these two lines of work focus on offline reconfiguration and optimization.
This research falls within the second category but aims to provide hardware engineers with the tools
to design, model and deploy onboard convolutional neural networks. Secondly, create reconfigurable
models and reconfiguration policies for runtime applications on embedded systems informed by the
requirements of the space domain. The aim is to develop tools capable of generating Field
Programmable Gate Array (FPGA) models of deep neural networks and explore introducing real-time
dynamic changes to the network during runtime to satisfy the requirements of autonomy and overcome
the risks of deploying static systems within harsh uncooperative environments.
A new automated tool for generating FPGA models from high level convolutional neural networks
is proposed. The tool is accessible, easy to use with a model-based, modular design approach and thus
requires no in-depth knowledge of FPGA design. The generated hardware models are accompanied by
performance estimation and design exploration steps deemed necessary from an exhaustive review of
the literature on CNN to FPGA compilers. The Offline Design Exploration (ODE) is automatically
carried out using an analytical model, this only requires running a few scripts dedicated to performance
estimation and a Multi-Objective Optimization algorithm, this process allows for finding the most
efficient and optimized configurations of a model based on hardware and user constraints.
Beyond the ability to quickly generate multiple hardware models, the need for runtime adaptivity is
tackled with a novel runtime reconfiguration methodology. Online Design Reconfiguration (ODR)
allows for attaining performance trade-offs during runtime with minimal resource overhead. Latency
and power trade-offs can be attained quickly and easily at runtime. To achieve this, a novel expandable
training technique for runtime reconfigurable and adaptive CNN models is proposed and tested. Training results in a unified model comprised of deployable “sub-network” IP cores that perform the
same task and share parameters but trade-off accuracy for speed and power consumption, this can be
done according to changing mission or environmental requirements.
Finally, to further consolidate the runtime adaptivity aspect, the automated compiler is significantly
expanded to include backpropagation and the ability to train convolutional neural networks online. A
novel pipeline architecture can perform backpropagation directly on FPGA while reusing most of the
forward pass pipeline to minimize resource overhead, enabling online learning on FPGAs. The
streaming and pipelined architectures facilitate online and autonomous deployment, areas that are
overlooked in the literature. This is finally tested using a custom synthetic dataset, allowing us to
provide new insights on the feasibility of implementing Online Learning on FPGAs for autonomous
vision applications, especially for close proximity satellite applications.
The generated designs achieved a 95x, 71x, and 18x throughput trade-off with resources for MNIST,
CIFAR-10, and SVHN architectures, respectively. In resource utilization, in terms of DSP Slices, the
proposed workflow achieved trade-offs of 44x for MNIST, 52x for SVHN, and 24x for CIFAR-10.
These trade-offs will allow designers to tailor implementations to their specific constraints and
objectives. The proposed Online Design Reconfiguration (ODR) policies achieved reductions in power
up to 25%, 28%, and 32% with 13x, 14x, and 50x gains in latency for a 0.7%, 2%, and 4% accuracy
loss in the MNIST, SVHN, and CIFAR-10 implementations respectively. When online learning
scenarios were tested, the FPGA pipeline’s performance was comparable to that of the GPU, with the
distinct advantage when it comes to power consumption, and x2.8, x5.8, and x3 speed up over GPU
was achieved on three architectures trained on MNIST, SVHN, and CIFAR-10 respectively.