Copyright © 2007 The Institute of Electronics, Information and Communication Engineers
Regular Section -- Letters -- Speech and Hearing |
Identification of ARMA Speech Models Using an Effective Representation of Voice Source
1 The author is with the Department of Computer Science and Engineering, Shah Jalal University of Science and Technology, Sylhet 3114, Bangladesh., 2 The author is with the Department of Information and Computer Sciences, Saitama University, Saitama-shi, 3388570 Japan. E-mail: shima{at}sie.ics.saitama-u.ac.jp
A two-stage least square identification method is proposed for estimating ARMA (autoregressive moving average) coefficients from speech signals. A pulse-train like input sequence is often employed to account for the source effects in estimating vocal tract parameters of voiced speech. Due to glottal and radiation effects, the pulse train, however, does not represent the effective voice source. The authors have already proposed a simple but effective model of voice source for estimating AR (autoregressive) coefficients. This letter extends our approach to ARMA analysis to wider varieties of speech sounds including nasal vowels and consonants. Analysis results on both synthetic and natural nasal speech are presented to demonstrate the analysis ability of the method.
Key Words: ARMA modeling, linear prediction, least square identification, glottal waveform, effective voice source
Manuscript received July 7, 2006. Manuscript revised September 29, 2006.
References
[1] J. Makhoul, "Linear prediction: A tutorial review," Proc. IEEE, vol.63, no.4, pp.561580, 1975.
[2] I.S. Konvalinka and M.R. Matausek, "Simultaneous estimation of poles and zeros in speech analysis and ITIF-Iterative inverse filtering algorithm," IEEE Trans. Acoust. Speech Signal Process., vol.27, no.5, pp.485492, 1979.
[3] H. Morikawa and H. Fujisaki, "Adaptive analysis of speech based on a pole-zero representation," IEEE Trans. Acoust. Speech Signal Process., vol.30, no.1, pp.7788, 1982.
[4] Y. Miyanaga, N. Miki, and N. Nagai, "Adaptive identification of a time-varying ARMA speech model," IEEE Trans. Acoust. Speech Signal Process., vol.34, no.3, pp.423433, 1986.
[5] D.G. Childers, J.C. Principe, and Y.T. Ting, "Adaptive WRLS-VFF for speech analysis," IEEE Trans. Speech Audio Process., vol.3, no.3, pp.209213, 1995.
[6] L. Mitiche, B. Derras, and A.B.H. Adamou-Mitiche, "Efficient low-order auto regressive moving average (ARMA) models for speech signals," Acoustics Research Letters Online, vol.5, no.2, pp.7581, 2004.
[7] H. Fujisaki and M. Ljungqvist, "Estimation of voice source and vocal tract parameters based on ARMA analysis and a model for the glottal source waveform," IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.12, pp.637640, 1987.
[8] W. Ding, H. Kasuya, and S. Adachi, "Simultaneous estimation of vocal tract and voice source parameters based on an ARX model," IEICE Trans. Inf. & Syst., vol.E78-D, no.6, pp.738743, June 1995.
[9] K. Funaki, Y. Miyanaga, and K. Tochinai, "Recursive ARMAX speech analysis based on a glottal source model with phase compensation," Signal Process., vol.74, no.3, pp.279295, 1999.
[10] G. Fant, J. Liljencrants, and Q.G. Lin, "A four parameter model of glottal flow," Quart. Progress and Status Rep., Speech Transmission Lab, Royal Inst. Technol., pp.113, Oct.Dec. 1985.
[11] M.S. Rahman and T. Shimamura, "Speech analysis based on modeling the effective voice source," IEICE Trans. Inf. & Syst., vol.E89-D, no.3, pp.11071115, March 2006.
[12] J.L. Flanagan, Speech Analysis, Synthesis, and Perceptions, 2nd ed., Springer-Verlag, New York, 1976.
[13] N.K. Sinha and B. Kuszta, Modeling and Identification of Dynamic Systems, Van Nostrand Reinhold Company, 1983.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||