Logo image
Local Optima Networks for Reinforcement Learning - A Case Study: Coupled Inverted Pendulum Task
Conference proceeding

Local Optima Networks for Reinforcement Learning - A Case Study: Coupled Inverted Pendulum Task

Yuyang Zhou, Alexander Turner and Ferrante Neri
2024 IEEE Conference on Artificial Intelligence (CAI), pp.865-870
25/06/2024

Abstract

Fitness landscape analysis inverted pendulum task Perturbation methods Predictive models Reinforcement learning Robot sensing systems robotics Sampling methods Sensitivity analysis Training
Reinforcement Learning (RL) refers to a set of methods where the agent learns directly from interactions without explicitly constructing a model of the environment. In RL, the agent interacts with an environment, takes actions, receives feedback, and learns to make decisions to maximize cumulative rewards over time. The primary goal is to find an optimal policy or value function that guides the agent's decision-making. Although RL can be formulated as an optimisation problem, it is rarely analysed or studied in depth. Conversely, just like any other optimisation task, an understanding of the problem might help detect high-quality policies. This study employs the use of Local Optima Networks (LONs) to analyse the fitness landscape associated with RL and modify the sampling method for the case of the coupled inverted pendulum tasks. Deep Deterministic Policy Gradient serves as a local search algorithm to refine the characterization of the fitness landscape. Experimental results on the two pendulum tasks in part confirm and extend the conclusions of a study on the same problem carried out from a robotics and engineering standpoint. However, the proposed approach uniquely identifies both known and previously unknown local optima solutions. A sensitivity analysis of a key LON parameter, the perturbation strength, offers deeper insights into the fitness landscape. The constructed LON indicates that, for the coupled inverted pendulum task, some basins of attraction are much stronger than others.

Metrics

Details

Logo image

Usage Policy