Abstract
We investigate the resource management in a sparse code multiple access (SCMA) enabled downlink LEO satellite communication system with multi-beam capability. It considers the joint optimization of beamwidth, subcarrier, and SCMA codebook of the system aiming to minimize the average transmission delay while maximizing the coverage probability under dynamic user distributions. To tackle the exponentially large discrete action space, we propose a MAP-Elites enhanced Multi-Objective Reinforcement Learning (ME-MORL) framework that combines discrete-action Proximal Policy Optimization (PPO) with an evolutionary archive to maintain diverse resource allocation policies while avoiding premature convergence. Extensive experiments demonstrate that in unseen scenarios, it attains the target coverage with only 50% of the training iterations required by competing approaches while maintaining a 24% lower transmission delay, demonstrating superior sample efficiency and cross-scenario generalization.