Skip Navigation

IEICE Transactions on Information and Systems 2008 E91-D(3):467-477; doi:10.1093/ietisy/e91-d.3.467
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by FUJIMOTO, M.
Right arrow Articles by ISHIZUKA, K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Copyright © 2008 The Institute of Electronics, Information and Communication Engineers

Special Section on Robust Speech Processing in Realistic Environments -- Papers -- Voice Activity Detection

Noise Robust Voice Activity Detection Based on Switching Kalman Filter

Masakiyo FUJIMOTO1 and Kentaro ISHIZUKA1

1 The authors are with NTT Communication Science Laboratories, NTT Corporation, Kyoto-fu, 619–0237 Japan. E-mail: masakiyo{at}cslab.kecl.ntt.co.jp; ishizuka{at}cslab.kecl.ntt.co.jp

This paper addresses the problem of voice activity detection (VAD) in noisy environments. The VAD method proposed in this paper is based on a statistical model approach, and estimates statistical models sequentially without a priori knowledge of noise. Namely, the proposed method constructs a clean speech / silence state transition model beforehand, and sequentially adapts the model to the noisy environment by using a switching Kalman filter when a signal is observed. In this paper, we carried out two evaluations. In the first, we observed that the proposed method significantly outperforms conventional methods as regards voice activity detection accuracy in simulated noise environments. Second, we evaluated the proposed method on a VAD evaluation framework, CENSREC-1-C. The evaluation results revealed that the proposed method significantly outperforms the baseline results of CENSREC-1-C as regards VAD accuracy in real environments. In addition, we confirmed that the proposed method helps to improve the accuracy of concatenated speech recognition in real environments.

Key Words: voice activity detection, statistical model, switching Kalman filter, noisy environment, CENSREC-1-C


Manuscript received June 29, 2007. Manuscript revised September 12, 2007.

Reference

[1] L.R. Rabiner and M.R. Sambur, "An algorithm for determining the endpoints of isolated utterances," Bell Syst. Tech. J., vol.54, no.2, pp.297–315, Feb. 1975.

[2] E. Nemer, R. Goubran, and S. Mahmoud, "Robust voice activity detection using higher-order statistics in the LPC residual domain," IEEE Trans. Speech Audio Process., vol.9, no.3, pp.217–231, March 2001.

[3] J. Ramirez, J.C. Segura, C. Benitex, A. de la Torre, and A. Rubio, "Efficient voice activity detection algorithm using long-term speech information," Speech Commun., vol.42, pp.271–287, April 2004.

[4] K. Ishizuka and H. Kato, "A feature for voice activity detection derived from speech analysis with the exponential autoregressive model," Proc. ICASSP '06, vol.I, pp.789–792, Toulouse, France, May 2006.

[5] K. Ishizuka and T. Nakatani, "Study of noise robust voice activity detection based on periodic component to aperiodic component ratio," Proc. SAPA '06, pp.65–70, Pittsburgh, PA, USA, Sept. 2006.

[6] J. Sohn, N.S. Kim, and W. Sung, "A statistical model-based voice activity detection," IEEE Signal Process. Lett., vol.6, no.1, pp.1–3, Jan. 1999.

[7] Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator," IEEE Trans. Acoust. Speech Signal Process., vol.32, no.12, pp.1109–1121, Dec. 1984.

[8] CENSREC-1-C Web site, http://sp.shinshu-u.ac.jp/CENSREC/en/CENSREC/CENSREC-1-C/

[9] M.S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, "A tutorial on particle filters for online non-linear/non-Gaussian Bayesian tracking," IEEE Trans. Signal Process., vol.50, no.2, pp.174–188, Feb. 2002.

[10] A.V. Balakrishnan, Kalman filtering theory, Springer-Verlag. 1984.

[11] J.C. Segura, A. de la Torre, M.C. Benitez, and A.M. Peinado, "Model-based compensation of the additive noise for continuous speech recognition. experiments using AURORA II database and tasks," Proc. EuroSpeech '01, vol.I, pp.221–224, Aalborg, Denmark, Sept. 2001.

[12] A. Nakamura, S. Matsunaga, T. Shimizu, M. Tonomura, and Y. Sagisaka, "Japanese speech database for robust speech recognition," Proc. ICSLP '96, vol.IV, pp.2199–2202, Philadelphia, USA, Oct. 1996.

[13] ITU-T Recommendation G.729 Annex B., "A silence compression scheme for G.729 optimized for terminals conforming to Recommendation V.70," Nov. 1996.

[14] ETSI standard document, "Speech processing, transmission and quality aspects (STQ), advanced distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms," ETSI ES 202 050 v.1.1.4, Nov. 2005.

[15] N. Otsu, "A threshold selection method from gray-level histograms," IEEE Trans. Syst. Man Cybern., vol.9, no.1, pp.62–66, Jan. 1979.

[16] S. Nakamura, K. Takeda, K. Yamamoto, T. Yamada, S. Kuroiwa, N. Kitaoka, T. Nishiura, A. Sasou, M. Mizumachi, C. Miyajima, M. Fujimoto, and T. Endo, "AURORA-2J, An evaluation framework for Japanese noisy speech recognition," IEICE Trans. Inf. & Syst., vol.E88-D, no.3, pp.535–544, March 2005.

[17] HTK Web site, http://htk.eng.cam.ac.uk/


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by FUJIMOTO, M.
Right arrow Articles by ISHIZUKA, K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?