Skip Navigation

IEICE Transactions on Information and Systems 2008 E91-D(4):1074-1081; doi:10.1093/ietisy/e91-d.4.1074
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by PAO, T.-L.
Right arrow Articles by YEH, J.-H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Copyright © 2008 The Institute of Electronics, Information and Communication Engineers

Regular Section -- Papers -- Human-computer Interaction

Comparison of Classification Methods for Detecting Emotion from Mandarin Speech

Tsang-Long PAO1, Yu-Te CHEN1 and Jun-Heng YEH1

1 The authors are with TTU, Taipei, Taiwan. E-mail: d8906005{at}ms2.ttu.edu.tw

It is said that technology comes out from humanity. What is humanity? The very definition of humanity is emotion. Emotion is the basis for all human expression and the underlying theme behind everything that is done, said, thought or imagined. Making computers being able to perceive and respond to human emotion, the human-computer interaction will be more natural. Several classifiers are adopted for automatically assigning an emotion category, such as anger, happiness or sadness, to a speech utterance. These classifiers were designed independently and tested on various emotional speech corpora, making it difficult to compare and evaluate their performance. In this paper, we first compared several popular classification methods and evaluated their performance by applying them to a Mandarin speech corpus consisting of five basic emotions, including anger, happiness, boredom, sadness and neutral. The extracted feature streams contain MFCC, LPCC, and LPC. The experimental results show that the proposed WD-MKNN classifier achieves an accuracy of 81.4% for the 5-class emotion recognition and outperforms other classification techniques, including KNN, MKNN, DW-KNN, LDA, QDA, GMM, HMM, SVM, and BPNN. Then, to verify the advantage of the proposed method, we compared these classifiers by applying them to another Mandarin expressive speech corpus consisting of two emotions. The experimental results still show that the proposed WD-MKNN outperforms others.

Key Words: emotion detection, Mandarin speech, performance comparison


Manuscript received July 2, 2007. Manuscript revised November 8, 2007.

Reference

[1] R. Picard, Affective Computing, Cambridge, MIT Press, 1997.

[2] H.-C. Chen, "EQ management & issue of affections in the military service," Coast Guard Administration Executive Yuan, vol.21, pp.54–56, 2006.

[3] P. Zimmermann, S. Guttormsen, B. Danuser, and P. Gomez, "Affective computing: A rationale for measuring mood with mouse and keyboard," J. Occupational Safety and Ergonomics, vol.9, pp.539–551, 2003.

[4] C. Nass and S. Brave, Wired for Speech, MIT Press, Cambridge, MA, 2005.

[5] L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, NJ, 1993.

[6] M. Pantic and J.M. Rothkrantz, "Toward an affect-sensitive multimodal human-computer interaction," Proc. IEEE, vol.91, no.9, pp.1370–1390, 2003.

[7] R. Nakatsu, J. Nicholson, and N. Tosa, "Emotion recognition and its application to computer agents with spontaneous interactive capabilities," Knowledge-Based Systems, vol.13, pp.497–504, 2000.

[8] C. Breazeal and L. Aryananda, "Recognition of affective communicative intent in robot-directed speech," Autonomous Robots, vol.12, pp.83–104, 2002.

[9] Y.-M. Cheng, Y.-S. Kuo, J.-H. Yeh, Y.-T. Chen, T.-L. Pao, and C.S. Chien, "Using recognition of emotions in speech to better understand brand slogan," Proc. International Workshop on Multimedia Signal Processing, pp.238–242, 2006.

[10] V. Petrushin, "Emotion recognition in speech signal: Experimental study, development, and application," Proc. Sixth International Conference on Spoken Language Processing (ICSLP 2000), pp.222–225, Beijing, China, 2000.

[11] P. Eckman, "An argument for basic emotions," Cognition and Emotion, vol.6, pp.169–200, 1992.

[12] J.-J. Lu, Construction and Testing of a Mandarin Emotional Speech Database and Its Application, Master's Thesis, Tatung University, Taipei, Taiwan, 2004.

[13] S. Dudani, "The distance-weighted k-nearest-neighbor rule," IEEE Trans. Syst. Man Cybern., vol.6, no.4, pp.325–327, 1976.

[14] C.M. Lee, S. Narayanan, and R. Pieraccini, "Combining acoustic and language information for emotion recognition," Proc. 7th International Conference on Spoken Language Processing, pp.873–876, 2002.

[15] C.M. Lee, S. Narayanan, and R. Pieraccini, "Classifying emotions in human-machine spoken dialogs," Proc. International Conference on Multimedia and Expo, pp.737–740, Lausanne, Switzerland, 2002.

[16] L.R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition," Proc. IEEE, vol.77, no.2, pp.257–286, 1989.

[17] Y.-L. Lin and G. Wei, "Speech emotion recognition based on HMM and SVM," Proc. 2005 International Conference on Machine Learning and Cybernetics, vol.8, pp.4898–4901, 2005.

[18] O.-W. Kwon, K. Chang, J. Hao, and T.-W. Lee, "Emotion recognition by speech signals," Proc. 8th European Conference on Speech Communication and Technology, pp.125–128, 2003.

[19] X.H. Le, G. Quenot, and E. Castelli, "Speaker-dependent emotion recognition for audio document indexing," International Conference on Electronics, Information, and Communications, vol.2, pp.580–584, 2004.

[20] R. Fernandez and R. Picard, "Modeling drivers' speech under stress," Speech Commun., vol.40, pp.145–159, 2003.

[21] B. Schuller, G. Rigoll, and M. Lang, "Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture," ICASSP 2004, vol.1, pp.557–560, 2004.

[22] D. Morrison, R. Wang, and L.C. De Silva, "Spoken affect classification using neural networks," IEEE International Conference on Granular Computing, vol.2, pp.583–586, 2005.

[23] D. Morrison, R. Wang, and L.C. De Silva, "Ensemble methods for spoken emotion recognition in call-centres," Speech Commun., vol.49, no.2, pp.98–112, 2007.

[24] T.L. Pao, Y.T. Chen, J.H. Yeh, Y.M. Cheng, and Y.Y. Lin, "A comparative study of different weighting schemes on KNN-based emotion recognition in Mandarin speech, Advanced Intelligent Computing Theories and Applications," ICIC 2007, Lect. Notes Comput. Sci., vol.4681, pp.997–1005, 2007.

[25] T.L. Pao, Y.T. Chen, J.H. Yeh, and W.Y. Liao, "Combining acoustic features for improved emotion recognition in Mandarin speech," Lect. Notes Comput. Sci., vol.3784, pp.279–285, Oct. 2005.

[26] T.L. Nwe, S.W. Foo, and L.C. De Silva, "Stress classification using subband based features," IEICE Trans. Inf. & Syst., vol.E86-D, no.3, pp.565–573, March 2003.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by PAO, T.-L.
Right arrow Articles by YEH, J.-H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?