Using Reinforcement Learning to Design and Control Free-Flying Space Robots

Lucy Elaine Jackson

doi:10.15126/thesis.900598

Free-flying space robots have the potential to revolutionise space exploration by facilitating a range of on-orbit operations. Whilst there have been some successful demonstrations of space robot technologies, the systems remain large with mission concepts limited to rendezvous with co-operative targets. However, the recent surge in small satellite technologies is changing the economics of space and downsizing a space robot is now viewed as a technologically feasible aim. Even with these advances, the design and control of a free-flying space robot is very challenging. Without a fixed base, the robot experiences large dynamic coupling effects where every motion causes counter-motions in other parts of the robot. These are influenced most heavily by the space robot’s size and mass distribution as well as the employed control technique. As such, successful operation is heavily tied to both physical design and control. In fact, this coupling underpins all robotic systems since control and hardware design are intrinsically linked. This thesis therefore investigates the relationship between hardware and control, looking at how they impact each other and how this relationship can be exploited to improve performance. Although this work uses the On-Orbit Assembly (OOA) of a large aperture space telescope using free-flying space robots as a mission concept, the methods developed have applicability across the entire field of robotics. There exists no documented design technique for any space robot. In fact, the dimensions and final design of successful missions are hard to find and never substantiated. This makes it hard to understand how robotic design impacts performance. This thesis presents a transparent design approach specific to free-flying space robots. A combined engineering approach is proposed, considering the mechanics and dynamics of the integrated system in order to present a design best suited for the OOA of a large aperture telescope. The feasibility of this space robot is evaluated in simulation and in-depth results are presented. This analysis supports the hypothesis that a small space robot is a viable solution for OOA and should be used as a starting point for the design of future systems. As with the majority of robot design techniques, the initial hardware centered design approach in this thesis treats the control as immutable. Instead, simultaneous optimisation of hardware and control parameters is likely to improve overall performance. This is demonstrated by reasoning over both control and design in a Reinforcement Learning (RL) pipeline. The proposed method forms a complex differential path through a trajectory rollout, allowing a vast amount of information that was previously lost in the ‘black-box’ environment to be used. This means refinements can be made to both the morphology and control parameters simultaneously. The result is an efficient and versatile approach to holistic robot design. Performance improvements are seen with the space robot and a number of benchmark tasks compared to just performing hardware design optimisation. While it is possible to modify robotic design in conjunction with control to improve performance, in some instances the design maybe changed with no option to modify the control algorithm. Data scarcity, brittle convergence and the gap between simulation & real world environments mean that most common RL approaches are subject to overfitting and fail to generalise to unseen environments. This means the replacement of parts with nonidentical components or the failure of sensors or joints will most likely render the control algorithm useless. Hardware-agnostic policies would mitigate this by allowing a single network to operate in a variety of test domains, where dynamics vary due to changes in robotic morphologies. This thesis utilises the idea that learning to adapt a known and successful control policy is easier and more flexible than jointly learning numerous policies for different morphologies. It presents the idea of Hardware Agnostic RL. In this approach, two control polices are combined and varied embodiments are sampled using a novel adversarial loss function. This self-regulates morphologies based on their performance. The result is a final control policy that is robust to changes in the environment as well as degradation and failure of the robot.

Using Reinforcement Learning to Design and Control Free-Flying Space Robots

Abstract

Files and links (1)

Metrics

Details

Using Reinforcement Learning to Design and Control Free-Flying Space Robots

Abstract

Files and links (1)

Metrics

Details

Usage Policy