Abstract
The successful performance of any system is
dependant on the hardware of the agent, which is typically
immutable during RL training. In this work, we present
ORCHID (Optimisation of Robotic Control and Hardware In
Design) which allows for truly simultaneous optimisation of
hardware and control parameters in an RL pipeline. We show
that by forming a complex differential path through a trajectory
rollout we can leverage a vast amount of information from the
system that was previously lost in the ‘black-box’ environment.
Combining this with a novel hardware-conditioned critic network minimises variance during training and ensures stable
updates are made. This allows for refinements to be made to
both the morphology and control parameters simultaneously.
The result is an efficient and versatile approach to holistic robot
design, that brings the final system nearer to true optimality.
We show improvements in performance across 4 different test
environments with two different control algorithms - in all experiments the maximum performance achieved with ORCHID
is shown to be unattainable using only policy updates with the
default design. We also show how re-designing a robot using
ORCHID in simulation, transfers to a vast improvement in the
performance of a real-world robot.