Copyright © 2007 The Institute of Electronics, Information and Communication Engineers
Regular Section -- Letters -- Speech and Hearing |
Response Time Reduction of Speech Recognizers Using Single Gaussians
1 The authors are with the Faculty of Information and Communications University (ICU), Daejeon, 305732 Korea. E-mail: sangbae{at}icu.ac.kr
In this paper, we propose a useful algorithm that can be applied to reduce the response time of speech recognizers based on HMM's. In our algorithm, to reduce the response time, promising HMM states are selected by single Gaussians. In speech recognition, HMM state likelihoods are evaluated by the corresponding single Gaussians first, and then likelihoods by original full Gaussians are computed and replaced only for the HMM states having relatively large likelihoods. By doing so, we can reduce the pattern-matching time for speech recognition significantly without any noticeable loss of the recognition rate. In addition, we cluster the single Gaussians into groups by measuring the distance between Gaussians. Therefore, we can reduce the extra memory much more. In our 10,000 word Korean POI (point-of-interest) recognition task, our proposed algorithm shows 35.57% reduction of the response time in comparison with that of the baseline system at the cost of 10% degradation of the WER.
Key Words: speech recognition, fast likelihood computation
Manuscript received August 2, 2006. Manuscript revised December 19, 2006.
References
[1] J. Fritsch and L. Rogina, "The bucket box intersection (BBI) algorithm for fast approximative evaluation of diagonal mixture Gaussians," IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol.2, pp.837840, Atlanta, GA, 1996.
[2] S. Ortmanns, T. Firzlaff, and H. Ney, "Fast likelihood computation methods for continuous mixture densities in large vocabulary speech recognition," Proc. EUROSPEECH-97: Eur. Conf. Speech Technology, pp.139142, Rhodes, Greece, 1997.
[3] A. Lee, T. Kawahara, and K. Shikano, "Gaussian mixture selection using context-independent HMM," IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol.1, pp.6972, 2001.
[4] L. Bahl, S. Gennaro, P. Gopalakrisnhan, and R. Mercer, "A fast approximate acoustic match for large vocabulary speech recognition," IEEE Trans. Speech Audio Process., vol.1, no.1, pp.5967, Jan. 1993.
[5] L. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, NJ, 1993.
[6] K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed., Academic Press, 1990.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||