Abstract
In this paper, the real-time deployment of unmanned aerial vehicles (UAVs) as flying base stations (BSs) for optimizing the throughput of mobile users is investigated for UAV networks. This problem is formulated as a time-varying mixed-integer non-convex programming (MINP) problem, which is challenging to find an optimal solution in a short time with conventional optimization techniques. Hence, we propose an actor-critic-based (AC-based) deep reinforcement learning (DRL) method to find near-optimal UAV positions at every moment. In the proposed method, the process searching for the solution iteratively at a particular moment is modeled as a Markov decision process (MDP). To handle infinite state and action spaces and improve the robustness of decision process, two powerful neural networks (NNs) are configured to evaluate the UAV position adjustments and make decisions, respectively. Compared with heuristic, sequential least-squares programming and fixed methods, Simulation results have shown that the proposed method outperforms in terms of the throughput at every moment in UAV networks.