Abstract
Sound event localization and detection (SELD) involves predicting active sound event classes over time while estimating their positions. The localization subtask in SELD is usually treated as a direction of arrival estimation problem, ignoring source distance. Only recently, SELD was extended to 3D by incorporating distance estimation, enabling the prediction of sound event positions in 3D space (3D SELD). However, existing methods lack input features specifically designed for distance estimation. We address this gap by introducing two novel reverberation-based feature formats: one using the direct-to-reverberant ratio (DRR) and another leveraging signal autocorrelation to capture early reflections. We extensively evaluate and benchmark these features on the STARSS23 dataset, combining them with established SELD features for sound event detection (SED) and direction-of-arrival estimation (DOAE), and testing across different network architectures. Our proposed features, applicable to both FOA and MIC formats, achieve state-of-the-art distance estimation, enhancing overall 3D SELD performance.