Abstract
Human communication is based on verbal and nonverbal information, e.g., facial expressions and intonation cue the speaker’s emotional state. Important speech features for emotion recognition are prosody (pitch, energy and duration) and voice quality (spectral energy, formants, MFCCs, jitter/shimmer). For facial expressions, features related to forehead, eye region, cheek and lip are important. Both audio and visual modalities provide relevant cues. Thus, audio and visual features were extracted and combined to evaluate emotion recognition on a British English corpus. The database of 120 utterances was recorded from an actor with 60 markers painted on his face, reading sentences in seven emotions (N=7): anger, disgust, fear, happiness, neutral, sadness and surprise. Recordings consisted of 15 phonetically-balanced TIMIT sentences per emotion, and video of the face captured by a 3dMD system. A total of 106 utterance-level audio features (prosodic and spectral) and 240 visual features (2D marker coordinates) were extracted. Experiments were performed with audio, visual and audiovisual features. The top 40 features were selected by sequential forward backward search using Bhattacharyya distance criterion. PCA and LDA transformations, calculated on the training data, were applied. Gaussian classifiers were trained with PCA and LDA features. Data was jack-knifed with 5 sets for training and 1 set for testing. Results were averaged over 6 tests. The emotion recognition accuracy was higher for visual features than audio features, for both PCA and LDA. Audiovisual results were close to those with visual features. Higher performance was achieved with LDA compared to PCA. The best recognition rate, 98%, was achieved for 6 LDA features (N-1) with audiovisual and visual features, whereas audio LDA scored 53%. Maximum PCA results for audio, visual and audiovisual features were 41%, 97% and 88% respectively. Future work involves experiments with more subjects and investigating the correlation between vocal and facial expressions of emotion.