YOLO-ACR: A new architecture for real-time object detection with advanced feature fusion and bounding box regression

Ferrante Neri; Mengchen Yang; Yu Xue

doi:10.1016/j.knosys.2025.114052

Back

YOLO-ACR: A new architecture for real-time object detection with advanced feature fusion and bounding box regression

Journal article

Peer reviewed

YOLO-ACR: A new architecture for real-time object detection with advanced feature fusion and bounding box regression

Ferrante Neri, Mengchen Yang and Yu Xue

Knowledge-based systems, Vol.326, p.114052

09/2025

DOI: https://doi.org/10.1016/j.knosys.2025.114052

Abstract

Real-time object detection

YOLO

Feature fusion

Attention mechanism

Bounding box regression

Deep learning

In recent years, the YOLO series has achieved remarkable progress in the field of real-time object detection, striking a favorable balance between speed and accuracy. However, challenges still persist in feature representation and bounding box regression, especially in scenarios involving dense objects, overlapping instances, and small object detection. To enhance detection performance, this paper proposes an improved architecture based on YOLOv11, named YOLO-ACR, which introduces enhancements in feature fusion, structural design, and regression optimization. Specifically, YOLO-ACR incorporates the CASAFF module, which adaptively fuses channel and spatial features to effectively strengthen the model's multi-scale representation capability. The proposed C3k2-RV structure draws on the efficient design of RepVGG, achieving a balance between lightweight architecture and feature extraction capability. For bounding box localization, we introduce the IS-MPDIoU loss function, which combines the spatial distance between bounding boxes and the overlap ratio of internal regions, and incorporates a dynamic scaling and coordinate distribution modeling mechanism to significantly improve regression precision and model robustness. Experiments on the PASCAL VOC dataset demonstrate that YOLO-ACR consistently outperforms YOLOv11 across different model scales: achieving a 1.7% improvement in mAP50 for the large-scale model (approximately 59M parameters), a 2.7% improvement for the medium-scale model (approximately 38M parameters), and a 0.5% improvement for the small-scale model (approximately 9.5M parameters). Furthermore, the proposed method also achieves noticeable performance improvements on the COCO2017 dataset.

Metrics

2 Record Views

Details

Title: YOLO-ACR: A new architecture for real-time object detection with advanced feature fusion and bounding box regression
Creators: Ferrante Neri
Mengchen Yang
Yu Xue
Publication Details: Knowledge-based systems, Vol.326, p.114052
Publisher: ELSEVIER; AMSTERDAM
Number of pages: 9
Publication Date: 09/2025
Grant note: National Natural Science Foundation of China: 62376127, 61876089, 61876185, 61902281, 61403206 Natural Science Foundation of Jiangsu Province: BK20141005 Natural Science Foundation of the Jiangsu Higher Education Institutions of China: 14KJB520025 Distinguished Professors of Jiangsu Province
This work was supported by the National Natural Science Foundation of China (No. 62376127, 61876089, 61876185, 61902281, and 61403206), by Natural Science Foundation of Jiangsu Province (No. BK20141005), by Natural Science Foundation of the Jiangsu Higher Education Institutions of China (No. 14KJB520025), and partially supported by the project of Distinguished Professors of Jiangsu Province.
Identifiers: 991007724202346; WOS:001529911000002
Academic Unit: School of Computer Science and Electronic Engineering
Language: English
Resource Type: Journal article

YOLO-ACR: A new architecture for real-time object detection with advanced feature fusion and bounding box regression

Abstract

Metrics

Details

Usage Policy