Abstract
The use of reinforcement learning (RL) has led to huge advancements in the field of robotics. However data
scarcity, brittle convergence and the gap between simulation & real world environments, mean that most common RL
approaches are subject to over fitting and fail to generalise to unseen environments. Hardware agnostic policies would
mitigate this by allowing a single network to operate in a variety of test domains, where dynamics vary due to changes in
robotic morphologies or internal parameters. We utilise the idea that learning to adapt a known and successful control policy is
easier and more flexible than jointly learning numerous control policies for different morphologies.
This paper presents the idea of Hardware Agnostic Reinforcement Learning using Adversarial selection (HARL-A). In
this approach training examples are sampled using a novel adversarial loss function. This is designed to self regulate
morphologies based on their learning potential. Simply applying our learning potential based loss function to current state-of-
the-art already provides ~ 30% improvement in performance. Meanwhile experiments using the full implementation
of HARL-A report an average increase of 70% to a standard RL baseline and 55% compared with current state-of-the-art.