Online Reconfiguration of Convolutional Neural Networks for Onboard Vision Applications

ALAA EDDINE  MAZOUZ

doi:10.15126/thesis.900422

The future of space exploration relies on developing novel systems for autonomous operations and onboard data handling. The use of deep learning can provide techniques and models capable of solving both existing and foreseen challenges identified in autonomous space applications. The existing research on deep learning has seen a strong focus on supervised learning performance in terms of accuracy, but current trends have shifted the attention towards speed and efficiency to meet the requirements of embedded systems. This meant a need for conceptual changes to the existing techniques and frameworks. In this context of computational efficiency, two recent branches of work can be distinguished. One tries to eliminate redundancy in data and computation through compression and optimization techniques. The other line exploits the fact that not all tasks require the same amount of computation and explores selective and task-dependent execution of deep neural networks—however, these two lines of work focus on offline reconfiguration and optimization. This research falls within the second category but aims to provide hardware engineers with the tools to design, model and deploy onboard convolutional neural networks. Secondly, create reconfigurable models and reconfiguration policies for runtime applications on embedded systems informed by the requirements of the space domain. The aim is to develop tools capable of generating Field Programmable Gate Array (FPGA) models of deep neural networks and explore introducing real-time dynamic changes to the network during runtime to satisfy the requirements of autonomy and overcome the risks of deploying static systems within harsh uncooperative environments. A new automated tool for generating FPGA models from high level convolutional neural networks is proposed. The tool is accessible, easy to use with a model-based, modular design approach and thus requires no in-depth knowledge of FPGA design. The generated hardware models are accompanied by performance estimation and design exploration steps deemed necessary from an exhaustive review of the literature on CNN to FPGA compilers. The Offline Design Exploration (ODE) is automatically carried out using an analytical model, this only requires running a few scripts dedicated to performance estimation and a Multi-Objective Optimization algorithm, this process allows for finding the most efficient and optimized configurations of a model based on hardware and user constraints. Beyond the ability to quickly generate multiple hardware models, the need for runtime adaptivity is tackled with a novel runtime reconfiguration methodology. Online Design Reconfiguration (ODR) allows for attaining performance trade-offs during runtime with minimal resource overhead. Latency and power trade-offs can be attained quickly and easily at runtime. To achieve this, a novel expandable training technique for runtime reconfigurable and adaptive CNN models is proposed and tested. Training results in a unified model comprised of deployable “sub-network” IP cores that perform the same task and share parameters but trade-off accuracy for speed and power consumption, this can be done according to changing mission or environmental requirements. Finally, to further consolidate the runtime adaptivity aspect, the automated compiler is significantly expanded to include backpropagation and the ability to train convolutional neural networks online. A novel pipeline architecture can perform backpropagation directly on FPGA while reusing most of the forward pass pipeline to minimize resource overhead, enabling online learning on FPGAs. The streaming and pipelined architectures facilitate online and autonomous deployment, areas that are overlooked in the literature. This is finally tested using a custom synthetic dataset, allowing us to provide new insights on the feasibility of implementing Online Learning on FPGAs for autonomous vision applications, especially for close proximity satellite applications. The generated designs achieved a 95x, 71x, and 18x throughput trade-off with resources for MNIST, CIFAR-10, and SVHN architectures, respectively. In resource utilization, in terms of DSP Slices, the proposed workflow achieved trade-offs of 44x for MNIST, 52x for SVHN, and 24x for CIFAR-10. These trade-offs will allow designers to tailor implementations to their specific constraints and objectives. The proposed Online Design Reconfiguration (ODR) policies achieved reductions in power up to 25%, 28%, and 32% with 13x, 14x, and 50x gains in latency for a 0.7%, 2%, and 4% accuracy loss in the MNIST, SVHN, and CIFAR-10 implementations respectively. When online learning scenarios were tested, the FPGA pipeline’s performance was comparable to that of the GPU, with the distinct advantage when it comes to power consumption, and x2.8, x5.8, and x3 speed up over GPU was achieved on three architectures trained on MNIST, SVHN, and CIFAR-10 respectively.

Online Reconfiguration of Convolutional Neural Networks for Onboard Vision Applications

Abstract

Files and links (1)

Metrics

Details

Online Reconfiguration of Convolutional Neural Networks for Onboard Vision Applications

Abstract

Files and links (1)

Metrics

Details

Usage Policy