Multi-agent Uncertainty-Aware Pessimistic Model-Based Reinforcement Learning for Connected Autonomous Vehicles

Ruoqi Wen; Rongpeng Li; Xing Xu; Mahdi Boloursaz Mashhadi; Pei Xiao; Zhifeng Zhao

doi:10.1109/TMC.2026.3683781

Back

Multi-agent Uncertainty-Aware Pessimistic Model-Based Reinforcement Learning for Connected Autonomous Vehicles

Journal article

Peer reviewed

Multi-agent Uncertainty-Aware Pessimistic Model-Based Reinforcement Learning for Connected Autonomous Vehicles

Ruoqi Wen, Rongpeng Li, Xing Xu, Mahdi Boloursaz Mashhadi, Pei Xiao and Zhifeng Zhao

IEEE Transactions on Mobile Computing, Vol.Early Access(Early Access)

14/04/2026

DOI: https://doi.org/10.1109/TMC.2026.3683781

Abstract

Autonomous vehicle control

multi-agent model-based reinforcement learning

probably approximately correct guarantee

Antennas

Radio broadcasting

Frequency modulation

Programmable logic arrays

Circuits

Logic arrays

Logic circuits

Programmable circuits

Communication systems

Integrated Circuits

Deep Reinforcement Learning (DRL) holds significant promise for achieving human-like Autonomous Vehicle (AV) capabilities, but suffers from low sample efficiency and challenges in reward design. Model-Based Reinforcement Learning (MBRL) offers improved sample efficiency and generalizability compared to Model-Free Reinforcement Learning (MFRL) in various multi-agent decision-making scenarios. Nevertheless, MBRL faces critical difficulties in estimating uncertainty during the model learning phase, thereby limiting its scalability and applicability in real-world scenarios. Additionally, most studies on Connected Autonomous Vehicles (CAVs) focus on single-agent decision-making. In contrast, existing multi-agent MBRL solutions lack computa-tionally tractable algorithms with Probably Approximately Correct (PAC) guarantees, a crucial factor for ensuring policy reliability with limited training data. To address these challenges, we propose MA-PMBRL, a novel Multi-Agent Pessimistic Model-Based Reinforcement Learning framework for CAVs, incorporating a max-min optimization approach to enhance robustness and decision-making. To mitigate the inherent subjectivity of uncertainty estimation in MBRL and avoid incurring catastrophic failures in AV, MA-PMBRL employs a pessimistic optimization framework combined with Projected Gradient Descent (PGD) for both model and policy learning. MA-PMBRL also employs general function approximations under partial dataset coverage to enhance learning efficiency and system-level performance. By bounding the suboptimality of the resulting policy under mild theoretical assumptions, we successfully establish PAC guarantees for MA-PMBRL, demonstrating that the proposed framework represents a significant step toward scalable, efficient, and reliable multi-agent decision-making for CAVs. Index Terms—Autonomous vehicle control, multi-agent model-based reinforcement learning, probably approximately correct guarantee.

Files and links (1)

pdf

TMC_final_main2.68 MB

Author's Accepted Manuscript Restricted. Access maybe granted on request., This file will be open access upon publication. CC BY V4.0

Metrics

1 Record Views

Details

Title: Multi-agent Uncertainty-Aware Pessimistic Model-Based Reinforcement Learning for Connected Autonomous Vehicles
Creators: Ruoqi Wen (Author) - Zhejiang University
Rongpeng Li (Author) - Zhejiang University
Xing Xu (Author) - State Grid Hebei Electric Power Company
Mahdi Boloursaz Mashhadi (Author) - University of Surrey, School of Computer Science & Electronic Engineering
Pei Xiao (Author) - University of Surrey, School of Computer Science & Electronic Engineering
Zhifeng Zhao (Author) - Zhejiang Lab
Publication Details: IEEE Transactions on Mobile Computing, Vol.Early Access(Early Access)
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
First online publication date: 14/04/2026
Date accepted for publication: 10/04/2026
Grant note: This work was supported in part by the National Key Research and Development Program of China under Grant 2024YFE0200600, in part by the Zhejiang Provincial Major Science and Technology Program (Jianbing Project) under Grant No. 2026C01034, and in part by Huawei Cooperation Project under Grant TC20240829036. For the purpose of open access, the authors have applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising.
Identifiers: 991118195502346
Copyright: © 2026 IEEE. All rights reserved, including rights for text and data mining, and training of artificial intelligence and similar technologies. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
Academic Unit: School of Computer Science & Electronic Engineering
Language: English
Resource Type: Journal article

Multi-agent Uncertainty-Aware Pessimistic Model-Based Reinforcement Learning for Connected Autonomous Vehicles

Abstract

Files and links (1)

Metrics

Details

Usage Policy