Abstract
In this paper, we investigate a cellular-connected unmanned aerial vehicle (UAV) network, where multiple UAVs receive messages from base stations (BSs) in the down-link, and in the meantime, BSs serve their paired ground user equipments (UEs). To effectively manage inter-cell interferences (ICIs) among UEs due to intense reuse of time-frequency resource block (RB) resource, a first p-tier based RB coordination criterion is adopted. Then, to enhance wireless transmission quality for UAVs while protecting terrestrial UEs from being interfered by ground-to-air (G2A) transmissions, a radio resource management (RRM) problem of joint dynamic RB coordination and time-varying beamforming design is formulated to minimize UAV's ergodic outage duration (EOD). To cope with conventional optimization techniques' inefficiency in solving the formulated RRM problem, a deep reinforcement learning (DRL)-aided solution is proposed, where deep double duelling Q network (D3QN) and twin delayed deep deterministic policy gradient (TD3) are invoked to deal with RB coordination in the discrete action domain and beamforming design in the continuous action regime, respectively. Numerical results illustrate the effectiveness of the proposed hybrid D3QN-TD3 algorithm, compared to representative baselines.