Abstract
—The Internet of Things (IoT) introduces diverse requirements and ubiquitous connections, necessitating efficient and affordable energy consumption as the ecosystem continues to grow. To address this challenge, we investigate a pure nonorthogonal multiple access (pure-NOMA) beamforming scheme to enhance system capacity by accommodating more IoT devices within the same spectrum. An energy efficiency (EE) maximization problem is formulated, jointly optimizing the beamforming matrix, power allocation, and device clustering. Due to thedynamic nature of the transmission channel and the coupling non-convex mixed integer nonlinear programming problem, it is challenging to solve this problem by conventional mathematical methods. Additionally, the high dimensionality and coupling non-convex mixed integer nonlinear programming problem pose significant challenges for traditional reinforcement learning (RL) methods. To overcome these issues, we propose a curiositydriven approach that leverages intrinsic information from the base station (BS) to achieve energy efficient resource allocation. Simulation results demonstrate that pure-NOMA offers up to a 25% improvement in EE compared to hybrid-NOMA, while the curiosity-driven learning method outperforms baseline techniques—including deep reinforcement learning (DRL), zeroforcing, and random methods—achieving a 14.78% reward gain over the DRL approach. The effectiveness of the proposed method is validated across various beam settings, device counts, qualityof service requirements, and time consumption metrics, all while maintaining comparable computational complexity.