Skip Navigation

IEICE Transactions on Information and Systems 2008 E91-D(3):411-421; doi:10.1093/ietisy/e91-d.3.411
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by KITAOKA, N.
Right arrow Articles by NAKAGAWA, S.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Copyright © 2008 The Institute of Electronics, Information and Communication Engineers

Special Section on Robust Speech Processing in Realistic Environments -- Papers -- Noisy Speech Recognition

Noisy Speech Recognition Based on Integration/Selection of Multiple Noise Suppression Methods Using Noise GMMs

Norihide KITAOKA1, Souta HAMAGUCHI2 and Seiichi NAKAGAWA2

1 The author is with Nagoya University, Nagoya-shi, 464–8603 Japan. E-mail: kitaoka{at}nagoya-u.jp, 2 The authors are with Toyohashi University of Technology, Toyohashi-shi, 441–8580 Japan.

To achieve high recognition performance for a wide variety of noise and for a wide range of signal-to-noise ratio, this paper presents methods for integration of four noise reduction algorithms: spectral subtraction with smoothing of time direction, temporal domain SVD-based speech enhancement, GMM-based speech estimation and KLT-based comb-filtering. In this paper, we proposed two types of combination methods of noise suppression algorithms: selection of front-end processor and combination of results from multiple recognition processes. Recognition results on the CENSREC-1 task showed the effectiveness of our proposed methods.

Key Words: noisy speech recognition, noise suppression method selection, CENSREC-1


Manuscript received July 2, 2007. Manuscript revised September 18, 2007.

Reference

[1] A. Lee, K. Nakamura, R. Nisimura, H. Saruwatari, and K. Shikano, "Noise robust real world spoken dialogue system using GMM based rejection of unintended inputs," INTERSPEECH2004-ICSLP, vol.I, pp.173–176, 2004.

[2] R. Nisimura, A. Hashizume, T. Irino, and H. Kawahara, "Human-robot interaction interface using GMM-based noise recognition," WESPAC IX 2006, vol.347, pp.26–28, 2006.

[3] S. Hamaguchi, N. Kitaoka, and S. Nakagawa, "Robust speech recognition under noisy environments based on selection of multiple noise suppression methods," IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing (NSIP2005), pp.308–313, 2005.

[4] J.G. Fiscus, "A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER)," Proc. ASRU, pp.247–354, 1997.

[5] H. Schwenk and J-L. Gauvain, "Combining multiple speech recognizers using voting and language model information," Proc. 6th. ICSLP, pp.915–918, 2000.

[6] T. Utsuro, H. Nishizaki, Y. Kodama, and S. Nakagawa, "Estimating highly confident portions based on agreement among outputs of multiple LVCSR models," Systems and Computers in Japan, vol.35, no.7, pp.33–40, 2004.

[7] V. Goel, S. Kumar, and W. Byrne, "Segmental minimum Bayes-risk decoding for automatic speech recognition," IEEE Trans. Speech Audio Process., vol.12, no.3, pp.234–250, 2004.

[8] T. Shinozaki and S. Furui, "Spontaneous speech recognition using a massively parallel decoder," ICSLP-2004, pp.1705–1708, 2004.

[9] S. Matsuda, T. Jitsuhiro, K. Markov, and S. Nakamura, "ATR parallel decoding based speech recognition system robust to noise and speaking styles," IEICE Trans. Inf. & Syst., vol.E89-D, no.3, pp.989–997, March 2006.

[10] M. Ida and S. Nakamura, "HMM composition-based rapid model adaptation using a priori noise GMM adaptation evaluation on AURORA2 corpus," Proc. ICSLP2002, pp.437–440, 2002.

[11] J. Okada, T. Yamada, and N. Kitawaki, "Integration of recognition results from multiple noise reduction algorithms," 2004 Spring Meeting of the Acoustical Society of Japan, pp.157–158, 2004.

[12] N. Kitaoka, S. Hamaguchi, and S. Nakagawa, "Noisy speech recognition based on selection of multiple noise suppression methods using noise GMMs," ICSLP-2006, pp.2566–2569, Sept. 2006.

[13] H.G. Hirsh and D. Pearce, "The AURORA experimental frame work for the performance evaluations of speech recognition systems under noisy conditions," ISCA ITRW ASR2000, 2000.

[14] N. Kitaoka and S. Nakagawa, "Evaluation of spectral subtraction with smoothing of time direction on the AURORA2 task," Proc. ICSLP2002, pp.465–468, 2002.

[15] S.F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust. Speech Signal Process., vol.27, no.2, pp.113–120, April 1979.

[16] M. Fujimoto and Y. Ariki, "Combination of temporal domain SVD based speech enhancement and GMM based speech estimation for ASR in noise-evaluation on the AURORA2 task," Proc. Eurospeech 2003, pp.1781–1784, 2003.

[17] C. Uhl and M. Lieb, "Experiments with an extend adaptive SVD enhancement scheme for speech recognition in noise," ICASSP'01, vol.I, pp.280–283, 2001.

[18] J.C. Segura, A. de la Torre, M.C. Benitez, and A.M. Peinado, "Model-based compensation of the additive noise for continuous speech recognition. Experiments using the AURORA II database and tasks," Proc. EUROSPEECH2001, vol.1, pp.221–224, 2001.

[19] M. Ikeda, K. Takeda, and F. Itakura, "Speech enhancement by quadratic comb-filtering," IEICE Technical Report, SP96-45, 1996.

[20] S. Nakamura, K. Takeda, K. Yamamoto, T. Yamada, S. Kuroiwa, N. Kitaoka, T. Nishiura, A. Sasou, M. Mizumachi, C. Miyajima, M. Fujimoto, and T. Endo, "AURORA-2J: An evaluation framework for Japanese noisy speech recognition," IEICE Trans. Inf. & Syst., vol.E88-D, no.3, pp.535–544, March 2005.

[21] T. Yamada, J. Okada, K. Takeda, N. Kitaoka, M. Fujimoto, S. Kuroiwa, K. Yamamoto, T. Nishiura, M. Mizumachi, and S. Nakamura, "Integration of noise reduction algorithms for AURORA2 task," Proc. Eurospeech 2003, pp.1769–1772, 2003.

[22] S. Nakagawa, Pattern Information Processing, Maruzen Ltd., 1999.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by KITAOKA, N.
Right arrow Articles by NAKAGAWA, S.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?