Copyright © 2007 The Institute of Electronics, Information and Communication Engineers
Regular Section -- Letters -- Speech and Hearing |
State Duration Modeling for HMM-Based Speech Synthesis
1 The authors are with the Department of Computer Science and Engineering, Nagoya Institute of Technology, Nagoya-shi, 4668555 Japan. E-mail: zen{at}ics.nitech.ac.jp, E-mail: tokuda{at}ics.nitech.ac.jp, E-mail: kitamura{at}nitech.ac.jp, 2 The authors are with the Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Yokohama-shi, 2268502 Japan. E-mail: takao.kobayashi{at}ip.titech.ac.jp, 3 Presently, with the Corporate Research & Development Center, Toshiba Corporation., 4 Presently, with Toyota Central R&D Labs., Inc.
This paper describes the explicit modeling of a state duration's probability density function in HMM-based speech synthesis. We redefine, in a statistically correct manner, the probability of staying in a state for a time interval used to obtain the state duration PDF and demonstrate improvements in the duration of synthesized speech.
Key Words: duration modeling, speech synthesis, hidden Markov model
Manuscript received July 27, 2006.
References
[1] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis," IEICE Trans. Inf. & Syst. (Japanese Edition), vol.J83-D-II, no.11, pp.20992107, Nov. 2000.
[2] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Duration modeling for HMM-based speech synthesis," Proc. ICSLP, pp.2932, 1998.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||