Local Optima Networks for Reinforcement Learning - A Case Study: Coupled Inverted Pendulum Task

Yuyang Zhou; Alexander Turner; Ferrante Neri

doi:10.1109/CAI59869.2024.00118

Back

Conference proceeding

Local Optima Networks for Reinforcement Learning - A Case Study: Coupled Inverted Pendulum Task

Yuyang Zhou, Alexander Turner and Ferrante Neri

2024 IEEE Conference on Artificial Intelligence (CAI), pp.865-870

25/06/2024

DOI: https://doi.org/10.1109/CAI59869.2024.00118

Abstract

Fitness landscape analysis

inverted pendulum task

Perturbation methods

Predictive models

Reinforcement learning

Robot sensing systems

robotics

Sampling methods

Sensitivity analysis

Training

Reinforcement Learning (RL) refers to a set of methods where the agent learns directly from interactions without explicitly constructing a model of the environment. In RL, the agent interacts with an environment, takes actions, receives feedback, and learns to make decisions to maximize cumulative rewards over time. The primary goal is to find an optimal policy or value function that guides the agent's decision-making. Although RL can be formulated as an optimisation problem, it is rarely analysed or studied in depth. Conversely, just like any other optimisation task, an understanding of the problem might help detect high-quality policies. This study employs the use of Local Optima Networks (LONs) to analyse the fitness landscape associated with RL and modify the sampling method for the case of the coupled inverted pendulum tasks. Deep Deterministic Policy Gradient serves as a local search algorithm to refine the characterization of the fitness landscape. Experimental results on the two pendulum tasks in part confirm and extend the conclusions of a study on the same problem carried out from a robotics and engineering standpoint. However, the proposed approach uniquely identifies both known and previously unknown local optima solutions. A sensitivity analysis of a key LON parameter, the perturbation strength, offers deeper insights into the fitness landscape. The constructed LON indicates that, for the coupled inverted pendulum task, some basins of attraction are much stronger than others.

Metrics

1 Record Views

1 Times Cited - Web of Science

Details

Title: Local Optima Networks for Reinforcement Learning - A Case Study: Coupled Inverted Pendulum Task
Creators: Yuyang Zhou - University of Nottingham Ningbo China
Alexander Turner - University of Nottingham
Ferrante Neri - University of Surrey
Publication Details: 2024 IEEE Conference on Artificial Intelligence (CAI), pp.865-870
Publisher: IEEE
Number of pages: 6
Publication Date: 25/06/2024
Identifiers: 991111776802346; WOS:001289387700150
Academic Unit: School of Computer Science & Electronic Engineering
Language: English
Resource Type: Conference proceeding

Local Optima Networks for Reinforcement Learning - A Case Study: Coupled Inverted Pendulum Task

Abstract

Metrics

Details

Usage Policy