Logo image
Multi-agent Uncertainty-Aware Pessimistic Model-Based Reinforcement Learning for Connected Autonomous Vehicles
Journal article   Peer reviewed

Multi-agent Uncertainty-Aware Pessimistic Model-Based Reinforcement Learning for Connected Autonomous Vehicles

Ruoqi Wen, Rongpeng Li, Xing Xu, Mahdi Boloursaz Mashhadi, Pei Xiao and Zhifeng Zhao
IEEE Transactions on Mobile Computing, Vol.Early Access(Early Access)
14/04/2026

Abstract

Autonomous vehicle control multi-agent model-based reinforcement learning probably approximately correct guarantee Antennas Radio broadcasting Frequency modulation Programmable logic arrays Circuits Logic arrays Logic circuits Programmable circuits Communication systems Integrated Circuits

Deep Reinforcement Learning (DRL) holds significant promise for achieving human-like Autonomous Vehicle (AV) capabilities, but suffers from low sample efficiency and challenges in reward design. Model-Based Reinforcement Learning (MBRL) offers improved sample efficiency and generalizability compared to Model-Free Reinforcement Learning (MFRL) in various multi-agent decision-making scenarios. Nevertheless, MBRL faces critical difficulties in estimating uncertainty during the model learning phase, thereby limiting its scalability and applicability in real-world scenarios. Additionally, most studies on Connected Autonomous Vehicles (CAVs) focus on single-agent decision-making. In contrast, existing multi-agent MBRL solutions lack computa-tionally tractable algorithms with Probably Approximately Correct (PAC) guarantees, a crucial factor for ensuring policy reliability with limited training data. To address these challenges, we propose MA-PMBRL, a novel Multi-Agent Pessimistic Model-Based Reinforcement Learning framework for CAVs, incorporating a max-min optimization approach to enhance robustness and decision-making. To mitigate the inherent subjectivity of uncertainty estimation in MBRL and avoid incurring catastrophic failures in AV, MA-PMBRL employs a pessimistic optimization framework combined with Projected Gradient Descent (PGD) for both model and policy learning. MA-PMBRL also employs general function approximations under partial dataset coverage to enhance learning efficiency and system-level performance. By bounding the suboptimality of the resulting policy under mild theoretical assumptions, we successfully establish PAC guarantees for MA-PMBRL, demonstrating that the proposed framework represents a significant step toward scalable, efficient, and reliable multi-agent decision-making for CAVs. Index Terms—Autonomous vehicle control, multi-agent model-based reinforcement learning, probably approximately correct guarantee.

pdf
TMC_final_main2.68 MB
Author's Accepted Manuscript Restricted. Access maybe granted on request., This file will be open access upon publication. CC BY V4.0

Metrics

1 Record Views

Details

Logo image

Usage Policy