Logo image
Gen-SER: When the Generative Model Meets Speech Emotion Recognition
Conference paper

Gen-SER: When the Generative Model Meets Speech Emotion Recognition

Taihui Wang, Jinzheng Zhao, Rilin Chen, Tong Lei, Wenwu Wang and Dong Yu
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2026)
Institute of Electrical and Electronics Engineers (IEEE)
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2026) (Barcelona, Spain, 04/05/2026–08/05/2026)
17/01/2026

Abstract

distribution transport generative model target matching Speech emotion recognition
Speech emotion recognition (SER) is crucial in speech understanding and generation. Most approaches are based on either classification models or large language models. Different from previous methods, we propose Gen-SER, a novel approach that reformulates SER as a distribution shift problem via generative models. We propose to project discrete class labels into a continuous space, and obtain the terminal distribution via sinusoidal taxonomy encoding. The target-matching-based generative model is adopted to transform the initial distribution into the terminal distribution efficiently. The classification is achieved by calculating the similarity of the generated terminal distribution and ground truth terminal distribution. The experimental results confirm the efficacy of the proposed method, demonstrating its extensibility to various speech-understanding tasks and suggesting its potential applicability to a broader range of classification tasks.
pdf
GEN-SER482.74 kB
Author's Accepted Manuscript CC BY V4.0 Restricted. Access maybe granted on request., This file will be open access upon publication.
url
https://2026.ieeeicassp.org/View
Event WebsiteConference website

Metrics

1 Record Views

Details

Logo image

Usage Policy