Abstract
In this paper, we study the simultaneous wireless information and power transfer (SWIPT) cooperative system, where one source forwards information to one destination with the assistance of multiple relays. Each relay is equipped with a finite data buffer and a finite energy buffer storing the harvested energy by radio-frequency (RF). An optimization problem is formulated for throughput maximization of the SWIPT cooperative system, taking into consideration the strict delay constraint, dynamic channel conditions, time-varying discrete data buffer states and time-varying continuous energy buffer states. A discrete-time Markov decision process (MDP) is adopted to model the relay selection process referring to data buffer states and energy buffer states. Two deep Q-network (DQN)-based methods named invalid action penalty (IAP) and invalid action mask (IAM) are proposed. The simulation results show that the proposed IAM method can achieve better convergence and throughput performance than the IAP method.