Copyright © 2008 The Institute of Electronics, Information and Communication Engineers
Special Section on Robust Speech Processing in Realistic Environments -- Papers -- Noisy Speech Recognition |
Noisy Speech Recognition Based on Integration/Selection of Multiple Noise Suppression Methods Using Noise GMMs
1 The author is with Nagoya University, Nagoya-shi, 464–8603 Japan. E-mail: kitaoka{at}nagoya-u.jp, 2 The authors are with Toyohashi University of Technology, Toyohashi-shi, 441–8580 Japan.
To achieve high recognition performance for a wide variety of noise and for a wide range of signal-to-noise ratio, this paper presents methods for integration of four noise reduction algorithms: spectral subtraction with smoothing of time direction, temporal domain SVD-based speech enhancement, GMM-based speech estimation and KLT-based comb-filtering. In this paper, we proposed two types of combination methods of noise suppression algorithms: selection of front-end processor and combination of results from multiple recognition processes. Recognition results on the CENSREC-1 task showed the effectiveness of our proposed methods.
Key Words: noisy speech recognition, noise suppression method selection, CENSREC-1
Manuscript received July 2, 2007. Manuscript revised September 18, 2007.
Reference
[1] A. Lee, K. Nakamura, R. Nisimura, H. Saruwatari, and K. Shikano, "Noise robust real world spoken dialogue system using GMM based rejection of unintended inputs," INTERSPEECH2004-ICSLP, vol.I, pp.173–176, 2004. [2] R. Nisimura, A. Hashizume, T. Irino, and H. Kawahara, "Human-robot interaction interface using GMM-based noise recognition," WESPAC IX 2006, vol.347, pp.26–28, 2006. [3] S. Hamaguchi, N. Kitaoka, and S. Nakagawa, "Robust speech recognition under noisy environments based on selection of multiple noise suppression methods," IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing (NSIP2005), pp.308–313, 2005. [4] J.G. Fiscus, "A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER)," Proc. ASRU, pp.247–354, 1997. [5] H. Schwenk and J-L. Gauvain, "Combining multiple speech recognizers using voting and language model information," Proc. 6th. ICSLP, pp.915–918, 2000. [6] T. Utsuro, H. Nishizaki, Y. Kodama, and S. Nakagawa, "Estimating highly confident portions based on agreement among outputs of multiple LVCSR models," Systems and Computers in Japan, vol.35, no.7, pp.33–40, 2004. [7] V. Goel, S. Kumar, and W. Byrne, "Segmental minimum Bayes-risk decoding for automatic speech recognition," IEEE Trans. Speech Audio Process., vol.12, no.3, pp.234–250, 2004. [8] T. Shinozaki and S. Furui, "Spontaneous speech recognition using a massively parallel decoder," ICSLP-2004, pp.1705–1708, 2004. [9] S. Matsuda, T. Jitsuhiro, K. Markov, and S. Nakamura, "ATR parallel decoding based speech recognition system robust to noise and speaking styles," IEICE Trans. Inf. & Syst., vol.E89-D, no.3, pp.989–997, March 2006. [10] M. Ida and S. Nakamura, "HMM composition-based rapid model adaptation using a priori noise GMM adaptation evaluation on AURORA2 corpus," Proc. ICSLP2002, pp.437–440, 2002. [11] J. Okada, T. Yamada, and N. Kitawaki, "Integration of recognition results from multiple noise reduction algorithms," 2004 Spring Meeting of the Acoustical Society of Japan, pp.157–158, 2004. [12] N. Kitaoka, S. Hamaguchi, and S. Nakagawa, "Noisy speech recognition based on selection of multiple noise suppression methods using noise GMMs," ICSLP-2006, pp.2566–2569, Sept. 2006. [13] H.G. Hirsh and D. Pearce, "The AURORA experimental frame work for the performance evaluations of speech recognition systems under noisy conditions," ISCA ITRW ASR2000, 2000. [14] N. Kitaoka and S. Nakagawa, "Evaluation of spectral subtraction with smoothing of time direction on the AURORA2 task," Proc. ICSLP2002, pp.465–468, 2002. [15] S.F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust. Speech Signal Process., vol.27, no.2, pp.113–120, April 1979. [16] M. Fujimoto and Y. Ariki, "Combination of temporal domain SVD based speech enhancement and GMM based speech estimation for ASR in noise-evaluation on the AURORA2 task," Proc. Eurospeech 2003, pp.1781–1784, 2003. [17] C. Uhl and M. Lieb, "Experiments with an extend adaptive SVD enhancement scheme for speech recognition in noise," ICASSP'01, vol.I, pp.280–283, 2001. [18] J.C. Segura, A. de la Torre, M.C. Benitez, and A.M. Peinado, "Model-based compensation of the additive noise for continuous speech recognition. Experiments using the AURORA II database and tasks," Proc. EUROSPEECH2001, vol.1, pp.221–224, 2001. [19] M. Ikeda, K. Takeda, and F. Itakura, "Speech enhancement by quadratic comb-filtering," IEICE Technical Report, SP96-45, 1996. [20] S. Nakamura, K. Takeda, K. Yamamoto, T. Yamada, S. Kuroiwa, N. Kitaoka, T. Nishiura, A. Sasou, M. Mizumachi, C. Miyajima, M. Fujimoto, and T. Endo, "AURORA-2J: An evaluation framework for Japanese noisy speech recognition," IEICE Trans. Inf. & Syst., vol.E88-D, no.3, pp.535–544, March 2005. [21] T. Yamada, J. Okada, K. Takeda, N. Kitaoka, M. Fujimoto, S. Kuroiwa, K. Yamamoto, T. Nishiura, M. Mizumachi, and S. Nakamura, "Integration of noise reduction algorithms for AURORA2 task," Proc. Eurospeech 2003, pp.1769–1772, 2003. [22] S. Nakagawa, Pattern Information Processing, Maruzen Ltd., 1999.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This Article ![]()
![]()
Abstract
![]()
Full Text (PDF)
![]()
Alert me when this article is cited
![]()
Alert me if a correction is posted
![]()
Services ![]()
![]()
Email this article to a friend
![]()
Similar articles in this journal
![]()
Alert me to new issues of the journal
![]()
Add to My Personal Archive
![]()
Download to citation manager
![]()
Request Permissions
![]()
Google Scholar ![]()
![]()
Articles by KITAOKA, N.
![]()
Articles by NAKAGAWA, S.
![]()
Search for Related Content
![]()
Social Bookmarking ![]()
![]()
What's this?