Skip Navigation

IEICE Transactions on Information and Systems 2007 E90-D(5):863-867; doi:10.1093/ietisy/e90-d.5.863
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by RAHMAN, M. S.
Right arrow Articles by SHIMAMURA, T.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Copyright © 2007 The Institute of Electronics, Information and Communication Engineers

Regular Section -- Letters -- Speech and Hearing

Identification of ARMA Speech Models Using an Effective Representation of Voice Source

M. Shahidur RAHMAN1 and Tetsuya SHIMAMURA2

1 The author is with the Department of Computer Science and Engineering, Shah Jalal University of Science and Technology, Sylhet 3114, Bangladesh., 2 The author is with the Department of Information and Computer Sciences, Saitama University, Saitama-shi, 338–8570 Japan. E-mail: shima{at}sie.ics.saitama-u.ac.jp

A two-stage least square identification method is proposed for estimating ARMA (autoregressive moving average) coefficients from speech signals. A pulse-train like input sequence is often employed to account for the source effects in estimating vocal tract parameters of voiced speech. Due to glottal and radiation effects, the pulse train, however, does not represent the effective voice source. The authors have already proposed a simple but effective model of voice source for estimating AR (autoregressive) coefficients. This letter extends our approach to ARMA analysis to wider varieties of speech sounds including nasal vowels and consonants. Analysis results on both synthetic and natural nasal speech are presented to demonstrate the analysis ability of the method.

Key Words: ARMA modeling, linear prediction, least square identification, glottal waveform, effective voice source


Manuscript received July 7, 2006. Manuscript revised September 29, 2006.

References

[1] J. Makhoul, "Linear prediction: A tutorial review," Proc. IEEE, vol.63, no.4, pp.561–580, 1975.

[2] I.S. Konvalinka and M.R. Matausek, "Simultaneous estimation of poles and zeros in speech analysis and ITIF-Iterative inverse filtering algorithm," IEEE Trans. Acoust. Speech Signal Process., vol.27, no.5, pp.485–492, 1979.

[3] H. Morikawa and H. Fujisaki, "Adaptive analysis of speech based on a pole-zero representation," IEEE Trans. Acoust. Speech Signal Process., vol.30, no.1, pp.77–88, 1982.

[4] Y. Miyanaga, N. Miki, and N. Nagai, "Adaptive identification of a time-varying ARMA speech model," IEEE Trans. Acoust. Speech Signal Process., vol.34, no.3, pp.423–433, 1986.

[5] D.G. Childers, J.C. Principe, and Y.T. Ting, "Adaptive WRLS-VFF for speech analysis," IEEE Trans. Speech Audio Process., vol.3, no.3, pp.209–213, 1995.

[6] L. Mitiche, B. Derras, and A.B.H. Adamou-Mitiche, "Efficient low-order auto regressive moving average (ARMA) models for speech signals," Acoustics Research Letters Online, vol.5, no.2, pp.75–81, 2004.

[7] H. Fujisaki and M. Ljungqvist, "Estimation of voice source and vocal tract parameters based on ARMA analysis and a model for the glottal source waveform," IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.12, pp.637–640, 1987.

[8] W. Ding, H. Kasuya, and S. Adachi, "Simultaneous estimation of vocal tract and voice source parameters based on an ARX model," IEICE Trans. Inf. & Syst., vol.E78-D, no.6, pp.738–743, June 1995.

[9] K. Funaki, Y. Miyanaga, and K. Tochinai, "Recursive ARMAX speech analysis based on a glottal source model with phase compensation," Signal Process., vol.74, no.3, pp.279–295, 1999.

[10] G. Fant, J. Liljencrants, and Q.G. Lin, "A four parameter model of glottal flow," Quart. Progress and Status Rep., Speech Transmission Lab, Royal Inst. Technol., pp.1–13, Oct.–Dec. 1985.

[11] M.S. Rahman and T. Shimamura, "Speech analysis based on modeling the effective voice source," IEICE Trans. Inf. & Syst., vol.E89-D, no.3, pp.1107–1115, March 2006.[Abstract/Free Full Text]

[12] J.L. Flanagan, Speech Analysis, Synthesis, and Perceptions, 2nd ed., Springer-Verlag, New York, 1976.

[13] N.K. Sinha and B. Kuszta, Modeling and Identification of Dynamic Systems, Van Nostrand Reinhold Company, 1983.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by RAHMAN, M. S.
Right arrow Articles by SHIMAMURA, T.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?