Skip Navigation

IEICE Transactions on Information and Systems 2008 E91-D(3):430-438; doi:10.1093/ietisy/e91-d.3.430
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by KIM, W.
Right arrow Articles by HANSEN, J. H.L.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Copyright © 2008 The Institute of Electronics, Information and Communication Engineers

Special Section on Robust Speech Processing in Realistic Environments -- Papers -- Noisy Speech Recognition

Feature Compensation Employing Multiple Environmental Models for Robust In-Vehicle Speech Recognition

Wooil KIM1 and John H.L. HANSEN1

1 The authors are with the Center for Robust Speech Systems (CRSS) in Erik Jonsson School of Engineering and Computer Science, University of Texas at Dallas, Richardson, Texas, U.S.A. E-mail: John.Hansen{at}utdallas.edu

An effective feature compensation method is developed for reliable speech recognition in real-life in-vehicle environments. The CU-Move corpus, used for evaluation, contains a range of speech and noise signals collected for a number of speakers under actual driving conditions. PCGMM-based feature compensation, considered in this paper, utilizes parallel model combination to generate noise-corrupted speech model by combining clean speech and the noise model. In order to address unknown time-varying background noise, an interpolation method of multiple environmental models is employed. To alleviate computational expenses due to multiple models, an Environment Transition Model is employed, which is motivated from Noise Language Model used in Environmental Sniffing. An environment dependent scheme of mixture sharing technique is proposed and shown to be more effective in reducing the computational complexity. A smaller environmental model set is determined by the environment transition model for mixture sharing. The proposed scheme is evaluated on the connected single digits portion of the CU-Move database using the Aurora2 evaluation toolkit. Experimental results indicate that our feature compensation method is effective for improving speech recognition in real-life in-vehicle conditions. A reduction of 73.10% of the computational requirements was obtained by employing the environment dependent mixture sharing scheme with only a slight change in recognition performance. This demonstrates that the proposed method is effective in maintaining the distinctive characteristics among the different environmental models, even when selecting a large number of Gaussian components for mixture sharing.

Key Words: speech recognition, in-vehicle condition, feature compensation, environment transition model, mixture sharing


Manuscript received July 9, 2007. Manuscript revised September 20, 2007.

Reference

[1] S.F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust. Speech Signal Process., vol.27, no.2, pp.113–120, 1979.

[2] Y. Ephraim and D. Malah, "Speech enhancement using minimum mean square error short time spectral amplitude estimator," IEEE Trans. Acoust. Speech Signal Process., vol.32, no.6, pp.1109–1121, 1984.

[3] K.F. Lee, Automatic Speech Recognition: The Development of the SPHINX system, Kluwer Academic Publisher. 1989.

[4] J.H.L. Hansen and M. Clements, "Constrained iterative speech enhancement with application to speech recognition," IEEE Trans. Signal Process., vol.39, no.4, pp.795–805, 1991.

[5] J.H.L. Hansen, "Morphological constrained enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect," IEEE Trans. Speech Audio Process., vol.2, no.4, pp.598–614, 1994.

[6] B. Raj and R.M. Stern, "Missing-feature approaches in speech recognition," IEEE Signal Process. Mag., vol.22, no.5, pp.101–116, 2005.

[7] C.H. Lee, C.H. Lin, and B.H. Juang, "Study on speaker adaptation of the parameters of continuous density hidden Markov models," IEEE Trans. Signal Process., vol.39, no.4, pp.806–814, 1991.

[8] J.L. Gauvain and C.H. Lee, "Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains," IEEE Trans. Speech Audio Process., vol.2, no.2, pp.291–298, 1994.

[9] C.J. Leggetter and P.C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density HMMs," Comput. Speech Lang., vol.9, pp.171–185, 1995.

[10] B. Zhou and J.H.L. Hansen, "Rapid discriminative acoustic model based on eigenspace mapping for fast speaker adaptation," IEEE Trans. Speech Audio Process., vol.13, no.4, pp.554–564, 2005.

[11] J.H.L. Hansen, X. Zhang, M. Akbacak, U. Yapanel, B. Pellom, W. Ward, and P. Angkititrakul, "CU-Move: Advanced in-vehicle speech systems for route navigation," in DSP for In-Vehicle and Mobile Systems, Chapter 2, Springer, New York, NY. 2004.

[12] X. Zhang and J.H.L. Hansen, "CSA-BF: A constrained switched adaptive beamformer for speech enhancement and recognition in real car environments," IEEE Trans. Speech Audio Process., vol.11, no.6, pp.733–745, 2003.

[13] W. Kim, S. Ahn, and H. Ko, "Feature compensation scheme based on parallel combined mixture model," Eurospeech2003, pp.667–680, 2003.

[14] P.J. Moreno, B. Raj, and R.M. Stern, "Data-driven environmental compensation for speech recognition: A unified approach," Speech Commun., vol.24, pp.267–285, 1998.

[15] J. Droppo, L. Deng, and A. Acero, "Evaluation of SPLICE on the Aurora 2 and 3 tasks," ICSLP2002, pp.29–32, 2002.

[16] W. Kim, O. Kwon, and H. Ko, "PCMM-based feature compensation scheme using model interpolation and mixture sharing," ICASSP2004, pp.989–992, 2004.

[17] W. Kim and J.H.L. Hansen, "Feature compensation employing model combination for robust speech recognition in in-vehicle environment," Biennial on DSP for In-Vehicle & Mobile Systems, Istanbul, Turkey, June 2007.

[18] M. Akbacak and J.H.L. Hansen, "Environmental sniffing: Noise knowledge estimation for robust speech systems," IEEE Trans. Audio, Speech and Language Proc., vol.15, no.2, pp.465–477, 2007.

[19] P.J. Moreno, Speech Recognition in Noisy Environments, PhD. Thesis, Carnegie Mellon University. 1996.

[20] J.H.L. Hansen, P. Angkititrakul, J. Plucienkowski, S. Gallant, U. Yapanel, B. Pellom, W. Ward, and R. Cole, ""CU-Move": Analysis & corpus development for interactive in-vehicle speech systems," Eurospeech2001, Sept. 2001.

[21] J.H.L. Hansen, Getting Started with the CU-Move Corpus, Release. 2.0B Tech. Report, March 2005.

[22] NIST SPeech Quality Assurance (SPQA) package version 2.3, http://www.nist.gov/speech

[23] H.G. Hirsch and D. Pearce, "The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions," ISCA ITRW ASR2000, Sept. 2000.

[24] ETSI standard doc., Speech Processing, Transmission and Quality aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms, ETSI ES 201 108 v1.1.2 (2000-04), 2000.

[25] R. Martin, "Spectral subtraction based on minimum statistics," EUSIPCO-94, pp.1182–1185, 1994.

[26] ETSI standard doc., Speech Processing, Transmission and Quality aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms, ETSI ES 202 050 v1.1.1 (2002-10), 2002.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by KIM, W.
Right arrow Articles by HANSEN, J. H.L.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?