Abstract
Networks combining satellite and terrestrial systems are a current research focus. These systems utilize edge devices, cloud infrastructure, and low-Earth orbit satellites to provide cross-regional coverage and support service reliability. A challenge for these systems lies in determining which computational location should handle a specific task. Network conditions are highly variable, link quality between components fluctuates significantly, and applications have stringent time requirements. In such contexts, methods using fixed rules or static allocation have limited effectiveness.
This paper investigates a method for improving offloading decisions in multi-layered networks using learning. A model is constructed to describe end-to-end latency and resource constraints across layers (including edge, cloud, terrestrial cloud, and satellite components). The offloading problem is modelled as a process based on a Markov decision framework, enabling networks using deep Q-learning to develop adaptive strategies through interaction with the system. The method incorporates stabilization mechanisms such as a target network and experience replay to ensure reliable convergence.
Simulation results demonstrate that the learning strategy can consistently reduce end-to-end latency. Compared to baseline methods representing current practice, this method also improves task deadline fulfilment rates and better balances resource utilization. This study provides practical validation through two representative vehicular network communication scenarios. The first scenario examines an urban environment where convoy formation requires low latency. The second scenario considers a rural environment with abundant data from sensors and limited ground infrastructure coverage. A deep Q-learning strategy adapts to the specific constraints of each scenario. It leverages the strengths of different layers and proactively responds to congestion.
The results demonstrate that reinforcement learning provides a solution that exhibits flexibility and robustness in managing computation in heterogeneous STIN environments. The results show that this approach has significant potential to support emerging intelligent transportation and networked application systems.