Abstract
Open Radio Access Networks (Open-RAN) require cost- and energy-efficient solutions to facilitate their deployment at scale. A significant concern in multiple-input multiple-output (MIMO) systems employing traditional linear processing is the substantial number of radio frequency (RF) chains at the base station (BS), which is required to ensure the accurate decoding of spatially multiplexed streams. Recently, however, practical non-linear approaches, which facilitate near-optimal parallelizable tree searches, have been successfully implemented on actual systems and demonstrated the capability to considerably reduce the required RF chains without affecting user performance. Like QR decomposition (QRD) being used to perform channel inversion in linear systems, these non-linear approaches employ a sorted QRD (SQRD) to curtail the search complexity. However, this can be a significant bottleneck for general software-based non-linear solutions, preventing them from fully exploiting the gains. To address the latency limitations with SQRD, this work presents a high throughput hardware accelerator based on reformulating the underlying Modified Gram Schmidt process (MGS) to extract further parallelism than previous designs. Implementations of the proposed architecture demonstrate at least 2-fold improvements in the achievable throughput and processing latency over existing 4×4 and 8×8 field programmable gate array (FPGA) implementations and can be scaled up to 16×16 MIMO systems. Further, the proposed accelerator is integrated with the software framework that can considerably offload the processing burden for higher number of streams under strict latency conditions.