Copyright © 2006 The Institute of Electronics, Information and Communication Engineers
Special Section on Statistical Modeling for Speech Processing -- Papers -- Speech Recognition |
PS-ZCPA Based Feature Extraction with Auditory Masking, Modulation Enhancement and Noise Reduction for Robust ASR
1 The authors are with the Graduate School of Engineering, Toyohashi University of Technology, Toyohashi-shi, 4418580 Japan. E-mail: ghulam{at}vox.tutkie.tut.ac.jp, 2 Presently, with Tokyo Research Laboratory, IBM Japan Ltd.
A pitch-synchronous (PS) auditory feature extraction method based on ZCPA (Zero-Crossings Peak-Amplitudes) was proposed previously and showed more robustness over a conventional ZCPA and MFCC based features. In this paper, firstly, a non-linear adaptive threshold adjustment procedure is introduced into the PS-ZCPA method to get optimal results in noisy conditions with different signal-to-noise ratio (SNR). Next, auditory masking, a well-known auditory perception, and modulation enhancement that simulates a strong relationship between modulation spectrums and intelligibility of speech are embedded into the PS-ZCPA method. Finally, a Wiener filter based noise reduction procedure is integrated into the method to make it more noise-robust, and the performance is evaluated against ETSI ES202 (WI008), which is a standard front-end for distributed speech recognition. All the experiments were carried out on Aurora-2J database. The experimental results demonstrated improved performance of the PS-ZCPA method by embedding auditory masking into it, and a slightly improved performance by using modulation enhancement. The PS-ZCPA method with Wiener filter based noise reduction also showed better performance than ETSI ES202 (WI008).
Key Words: pitch synchronous analysis, ZCPA, auditory masking, modulation enhancement, Wiener filtering
Manuscript received July 11, 2005. Manuscript revised September 27, 2005.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. GHULAM, K. KATSURADA, J. HORIKAWA, and T. NITTA Pitch-Synchronous Peak-Amplitude (PS-PA)-Based Feature Extraction Method for Noise-Robust ASR IEICE Trans D: Information, November 1, 2006; E89-D(11): 2766 - 2774. [Abstract] [PDF] |
||||
