Abstract
Augmented reality environments face significant challenges in achieving context and semantic awareness, which are relevant for enhancing user experiences. Although machine learning has been extensively applied to areas such as object and action recognition, there is a knowledge gap in the integration of these techniques to improve contextual understanding in AR environments. This study explores how machine learning enhances object and action recognition, improving context awareness in augmented reality environments. A systematic review of fifteen studies revealed advancements in deep learning, semantic web, knowledge graphs, and multimodal interfaces, which collectively improved the capabilities of augmented reality systems. The findings include innovations in aspects such as 3D hand pose estimation, semantic gaze analysis through eye tracking, and hybrid approaches that optimize real-time processing. These developments improve augmented and mixed reality environments by enabling more realistic, interactive, and responsive applications. However, persistent challenges such as latency, scalability, and environmental adaptation underscore the need for more research. These results suggest a guide for the development of highly immersive and efficient AR/MR systems, with the potential to improve user interaction and operational capabilities.
Article highlights
Advancements in 3D hand pose estimation and semantic gaze analysis addresses AR/ MR challenges such as latency, accuracy, and adaptability to environmental conditions.
Hybrid processing models bridge the gap between high performance and accessibility, with lightweight models enabling real-time interactions on resource-constrained devices.
Future research should focus the development of generalized datasets and the refinement of algorithms to ensure robust performance across varied industries.