Abstract
Free-flying space robots have the potential to revolutionise space exploration by facilitating a
range of on-orbit operations. Whilst there have been some successful demonstrations of space
robot technologies, the systems remain large with mission concepts limited to rendezvous with
co-operative targets. However, the recent surge in small satellite technologies is changing the
economics of space and downsizing a space robot is now viewed as a technologically feasible
aim. Even with these advances, the design and control of a free-flying space robot is very
challenging. Without a fixed base, the robot experiences large dynamic coupling effects where
every motion causes counter-motions in other parts of the robot. These are influenced most
heavily by the space robot’s size and mass distribution as well as the employed control technique.
As such, successful operation is heavily tied to both physical design and control. In fact, this
coupling underpins all robotic systems since control and hardware design are intrinsically linked.
This thesis therefore investigates the relationship between hardware and control, looking at
how they impact each other and how this relationship can be exploited to improve performance.
Although this work uses the On-Orbit Assembly (OOA) of a large aperture space telescope
using free-flying space robots as a mission concept, the methods developed have applicability
across the entire field of robotics.
There exists no documented design technique for any space robot. In fact, the dimensions and
final design of successful missions are hard to find and never substantiated. This makes it hard
to understand how robotic design impacts performance. This thesis presents a transparent design
approach specific to free-flying space robots. A combined engineering approach is proposed,
considering the mechanics and dynamics of the integrated system in order to present a design
best suited for the OOA of a large aperture telescope. The feasibility of this space robot is
evaluated in simulation and in-depth results are presented. This analysis supports the hypothesis
that a small space robot is a viable solution for OOA and should be used as a starting point for
the design of future systems.
As with the majority of robot design techniques, the initial hardware centered design approach in
this thesis treats the control as immutable. Instead, simultaneous optimisation of hardware and
control parameters is likely to improve overall performance. This is demonstrated by reasoning
over both control and design in a Reinforcement Learning (RL) pipeline. The proposed method
forms a complex differential path through a trajectory rollout, allowing a vast amount of
information that was previously lost in the ‘black-box’ environment to be used. This means
refinements can be made to both the morphology and control parameters simultaneously. The
result is an efficient and versatile approach to holistic robot design. Performance improvements
are seen with the space robot and a number of benchmark tasks compared to just performing
hardware design optimisation.
While it is possible to modify robotic design in conjunction with control to improve performance,
in some instances the design maybe changed with no option to modify the control algorithm.
Data scarcity, brittle convergence and the gap between simulation & real world environments
mean that most common RL approaches are subject to overfitting and fail to generalise to unseen
environments. This means the replacement of parts with nonidentical components or the failure
of sensors or joints will most likely render the control algorithm useless. Hardware-agnostic
policies would mitigate this by allowing a single network to operate in a variety of test domains,
where dynamics vary due to changes in robotic morphologies. This thesis utilises the idea
that learning to adapt a known and successful control policy is easier and more flexible than
jointly learning numerous policies for different morphologies. It presents the idea of Hardware
Agnostic RL. In this approach, two control polices are combined and varied embodiments are
sampled using a novel adversarial loss function. This self-regulates morphologies based on their
performance. The result is a final control policy that is robust to changes in the environment as
well as degradation and failure of the robot.