← Back to VOLUME 3, ISSUE 5, MAY 2014
This work is licensed under a Creative Commons Attribution 4.0 International License.
Auditory Processing of Speech Signals for Speech Emotion Recognition
PRASHANT AHER, ALICE CHEERAN Department of Electrical Engineering, Veermata Jijabai Technological Institute (VJTI), Mumbai, India
Downloads: Download PDF
π 41 viewsπ₯ 0 downloads
Abstract: Feature extraction is most crucial in automatic speech emotion recognition (SER). The performance of cepstral features like Mel Frequency Cepstrum coefficient (MFCC) is good in clean environments but degrades when there exists data mismatch between training and testing phase. An Auditory based feature extraction for SER in noisy environment to recognize and classify the speech emotion from Berlin emotional speech database is presented. The proposed model consists of cochlear bandpass filterbank with zero-crossing for frequency estimation. Features extracted from input speech samples are fed to Support Vector Machine (SVM) classifier with RBF kernel function for classification. As shown in our results, in speech emotion recognition task, both MFCC and proposed feature have recognition accuracy of 81.9% and 89% respectively in clean testing conditions but when SNR of testing speech samples drop to 5 dB recognition accuracy of MFCC feature is 11% while proposed feature achieves an accuracy of 25% , which shows noise robustness of proposed features.
Keywords: Cochlea, cochlear filterbank, mel filterbank, noise robustness, RBF kernel, speech emotion recognition, SVM
Keywords: Cochlea, cochlear filterbank, mel filterbank, noise robustness, RBF kernel, speech emotion recognition, SVM
How to Cite:
[1] PRASHANT AHER, ALICE CHEERAN Department of Electrical Engineering, Veermata Jijabai Technological Institute (VJTI), Mumbai, India, βAuditory Processing of Speech Signals for Speech Emotion Recognition,β International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE)
