Abstract
Convolutional neural networks (CNNs) have exhibited state-of-the-art performance in various audio classification tasks. However, their real-time deployment remains a challenge on resource-constrained devices like embedded systems. In this paper, we present a demonstration of our standalone hardware device designed for real-time recognition of sound events commonly known as audio tagging. Our system incorporates a real-time implementation of a CNN-based pre-trained audio neural networks (PANNs) on an embedded hardware device, Raspberry Pi. We refer to our standalone device as "PiSoundSensing" system, which makes sense of surrounding sounds using a Raspberry Pi based hardware. Users can interact with the system through a physical button or using an online web interface. The web interface allows users to remotely control the standalone device, and visualize sound events detected over time. We provide a detailed description of the hardware and software used to build PiSoundSensing device. Also, we highlight useful observations including hardware-based standalone device performance compared to that of the software-based performance.