Copyright © 2008 The Institute of Electronics, Information and Communication Engineers
Special Section on Robust Speech Processing in Realistic Environments -- Papers -- Noisy Speech Recognition |
Robust Speech Recognition by Model Adaptation and Normalization Using Pre-Observed Noise
1 The authors are with NTT Cyber Space Laboratories, NTT Corporation, Yokosuka-shi, 239–0847 Japan.
Users require speech recognition systems that offer rapid response and high accuracy concurrently. Speech recognition accuracy is degraded by additive noise, imposed by ambient noise, and convolutional noise, created by space transfer characteristics, especially in distant talking situations. Against each type of noise, existing model adaptation techniques achieve robustness by using HMM-composition and CMN (cepstral mean normalization). Since they need an additive noise sample as well as a user speech sample to generate the models required, they can not achieve rapid response, though it may be possible to catch just the additive noise in a previous step. In the previous step, the technique proposed herein uses just the additive noise to generate an adapted and normalized model against both types of noise. When the user's speech sample is captured, only online-CMN need be performed to start the recognition processing, so the technique offers rapid response. In addition, to cover the unpredictable S/N values possible in real applications, the technique creates several S/N HMMs. Simulations using artificial speech data show that the proposed technique increased the character correct rate by 11.62% compared to CMN.
Key Words: noise robustness, distant-talking, spectral subtraction, HMM-composition, cepstral mean normalization
Manuscript received July 4, 2007. Manuscript revised September 17, 2007.
Reference
[1] F. Martin, K. Shikano, and Y. Minami, "Recognition of noisy speech by composition of hidden Markov models," EUROSPEECH, pp.1031–1034, Sept. 1993. [2] M.J.F. Gales and S.J. Young, "Robust continuous speech recognition using parallel model combination," IEEE Trans. Speech Audio Process., vol.4, no.5, pp.352–359, Sept. 1996. [3] H. Yamamoto, T. Kosaka, M. Yamada, Y. Komori, and M. Fujita, "Fast speech recognition algorithm under noisy environment using modified cms-pmc and improved idmm+sq," IEEE Int. Conf. Acoust. Speech Signal Process., vol.2, pp.847–850, April 1997. [4] M. Shozakai, S. Nakamura, and K. Shikano, "A non-iterative model-adaptive e-cmn/pmc approach for speech recognition in car environments," EUROSPEECH, pp.287–290, Sept. 1997. [5] K.H. Yuo and H.C. Wang, "Robust features derived from temporal trajectory filtering for speech recognition under the corruption of additive and convolutional noises," IEEE Int. Conf. Acoust. Speech Signal Process., vol.1, pp.577–580, May 1998. [6] B.S. Atal, "Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification," J. Acoust. Soc. Am., vol.55, no.6, pp.1304–1312, June 1974. [7] S. Kobashikawa, S. Takahashi, Y. Yamaguchi, and A. Ogawa, "Rapid response and robust speech recognition by preliminary model adaptation for additive and convolutional noise," INTERSPEECH 2005-EUROSPEECH, pp.965–968, Sept. 2005. [8] S.F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust. Speech Audio Process., vol.27, no.2, pp.113–120, April 1979. [9] J.S. Lim and A.V. Oppenheim, "Enhancement and bandwidth compression of noisy speech," Proc. IEEE, vol.67, no.12, pp.1586–1604, Dec. 1979. [10] S. Kobashikawa, S. Sakauchi, Y. Yamaguchi, and S. Takahashi, "Robust speech recognition based on HMM composition and modified Wiener filter," INTERSPEECH 2004-ICSLP, pp.2053–2056, 2004. [11] K. Kurakata, K. Matsushita, and Y. Kuchinomachi, "Database of domestic sounds for evaluation of auditory-signal audibility: JIS/TR S 0001," Acoust. Sci. & Tech., vol.24, no.1, pp.23–26, 2003. [12] Y. Suzuki, F. Asano, H.Y. Kim, and T. Sone, "An optimum computer-generated pulse signal suitable for the measurement of very long impulse responses," J. Acoust. Soc. Am., vol.97, no.2, pp.1119–1123, 1995.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This Article ![]()
![]()
Abstract
![]()
Full Text (PDF)
![]()
Alert me when this article is cited
![]()
Alert me if a correction is posted
![]()
Services ![]()
![]()
Email this article to a friend
![]()
Similar articles in this journal
![]()
Alert me to new issues of the journal
![]()
Add to My Personal Archive
![]()
Download to citation manager
![]()
Request Permissions
![]()
Google Scholar ![]()
![]()
Articles by KOBASHIKAWA, S.
![]()
Articles by TAKAHASHI, S.
![]()
Search for Related Content
![]()
Social Bookmarking ![]()
![]()
What's this?