Copyright © 2008 The Institute of Electronics, Information and Communication Engineers
Special Section on Robust Speech Processing in Realistic Environments -- Papers -- Speech Enhancement |
Multichannel Speech Enhancement Based on Generalized Gamma Prior Distribution with Its Online Adaptive Estimation
1 The author is with Institute for Infocomm Research, 21 HengMuiKeng Terrace, Singapore 119613. E-mail: hdtran{at}i2r.a-star.edu.sg, 2 The author is with the Graduate School of Information Science, Nagoya University, Nagoya-shi, 464–8601 Japan., 3 The author is with the Graduate School of Information Engineering, Meijo University, Nagoya-shi, 468–8502 Japan.
We present a multichannel speech enhancement method based on MAP speech spectral magnitude estimation using a generalized gamma model of speech prior distribution, where the model parameters are adapted from actual noisy speech in a frame-by-frame manner. The utilization of a more general prior distribution with its online adaptive estimation is shown to be effective for speech spectral estimation in noisy environments. Furthermore, the multi-channel information in terms of cross-channel statistics are shown to be useful to better adapt the prior distribution parameters to the actual observation, resulting in better performance of speech enhancement algorithm. We tested the proposed algorithm in an in-car speech database and obtained significant improvements of the speech recognition performance, particularly under non-stationary noise conditions such as music, air-conditioner and open window.
Key Words: multi-channel speech enhancement, speech recognition, generalized gamma distribution, moment matching
Manuscript received July 9, 2007. Manuscript revised September 14, 2007.
Reference
[1] A. Betkowska, K. Shinoda, and S. Furui, "Robust speech recognition using factorial HMMs for home environments," EURASIP Journal on Advances in Signal Processing, vol.2007, Article ID 20593, 9 pages, 2007. doi:10.1155/2007/20593 [2] M.A. Grasso, "The long-term adoption of speech recognition in medical applications," Proc. 16th IEEE Symposium on Computer-Based Medical Systems (CBMS 2003), pp.257–262, 2003. [3] J. Dines, J. Vepa, and T. Hain, "The segmentation of multi-channel meeting recordings for automatic speech recognition," Proc. INTERSPEECH, ICSLP, pp.1213–1216, Pittsburgh, PA, USA, 2006. [4] A. Acero, Acoustical and environmental robustness in automatic speech recognition, Kluwer Academic Publishers. 1993. [5] Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator," IEEE Trans. Acoust. Speech Signal Process., vol.ASSP-32, no.6, pp.1109–1121, 1984. [6] Y. Ephraim and D. Malah, "Speech enhancement using MMSE log-spectral amplitude estimations," IEEE Trans. Acoust. Speech Signal Process., vol.ASSP-33, no.2, pp.443–445, 1985. [7] P. Wolfe and S. Godsill, "Simple alternatives to the Ephraim suppression rule for speech enhancement," IEEE Workshop on Statistical Signal Processing. 2001. [8] R. Martin, "Statistical methods for enhancement of noisy speech," Proc. IWAENC, Kyoto, 2003. [9] T.H. Dat, K. Takeda, and F. Itakura, "Generalized gamma modeling of speech and its online estimation for speech enhancement," Proc. ICASSP, Philadelphia, USA, 2005. [10] T.H. Dat, K. Takeda, and F. Itakura, "Gamma modeling of speech power and its on-line estimation for statistical speech enhancement," IEICE Trans. Inf. & Syst., vol.E89-D, no.3, pp.1040–1049, March 2006. [11] W. Li, T. Shinde, H. Fujimura, C. Miyajima, T. Nishino, K. Itou, K. Takeda, and F. Itakura, "Multiple regression of log spectra for in-car speech recognition using multiple distributed microphones," IEICE Trans. Inf. & Syst., vol.E88-D, no.3 pp.384–390, March 2005. [12] X. Cui and A. Alwan, "Noise robust speech recognition using feature compensation based on polynomial regression of utterance SNR," IEEE Trans. Acoust. Speech Signal Process., vol.13, no.6, pp.1161–1171, 2005. [13] H. Shen, Q. Li, J. Guo, and G. Liu, "Model-based feature compensation for robust speech recognition," Fundam. Inf., vol.72, no.4, pp.529–539, Dec. 2006. [14] D. Ward and M. Brandstein, Microphone Arrays: Signal Processing Techniques and Applications, Springer, ISBN 3540419535, 2001. [15] T. Lotter, C. Benien, and P. Vary, "Multichannel direction-independent speech enhancement using spectral amplitude estimation," EURASIP Journal on Applied Signal Processing, vol.11, pp.1147–1156, 2003. [16] J. Rosca, R. Balan, and C. Beaugeant, "Multi-channel psychoacoustically motivated speech enhancement," Proc. ICASSP, Hong Kong, 2003. [17] V.T. Toth, "Programmable calculators: Calculators and the gamma function," http://www.rskey.org/gamma.htm [18] R. Le Bouquin-Jeannes and G. Faucon, "Study of a voice activity detector and its influence on a noise reduction system," Speech Commun., vol.16, pp.245–254, 1995. [19] K. Takeda, H. Fujimura, K. Itou, N. Kawaguchi, S. Matsubara, and F. Itakura, "Construction and evaluation of a large in-car speech corpus," IEICE Trans. Inf. & Syst., vol.E88-D, no.3, pp.553–561, March 2005.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This Article ![]()
![]()
Abstract
![]()
Full Text (PDF)
![]()
Alert me when this article is cited
![]()
Alert me if a correction is posted
![]()
Services ![]()
![]()
Email this article to a friend
![]()
Similar articles in this journal
![]()
Alert me to new issues of the journal
![]()
Add to My Personal Archive
![]()
Download to citation manager
![]()
Request Permissions
![]()
Google Scholar ![]()
![]()
Articles by HUY DAT, T.
![]()
Articles by ITAKURA, F.
![]()
Social Bookmarking ![]()
![]()
What's this?