Skip Navigation

IEICE Transactions on Information and Systems 2006 E89-D(5):1712-1719; doi:10.1093/ietisy/e89-d.5.1712
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by FATTAH, M. A.
Right arrow Articles by KUROIWA, S.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Copyright © 2006 The Institute of Electronics, Information and Communication Engineers

Regular Section -- Papers -- Speech and Hearing

Effects of Phoneme Type and Frequency on Distributed Speaker Identification and Verification

Mohamed Abdel FATTAH, Fuji REN and Shingo KUROIWA

The authors are with the University of Tokushima, Tokushima-shi, 770–8506 Japan. E-mail: mohafi{at}is.tokushima-u.ac.jp

In the European Telecommunication Standards Institute (ETSI), Distributed Speech Recognition (DSR) front-end, the distortion added due to feature compression on the front end side increases the variance flooring effect, which in turn increases the identification error rate. The penalty incurred in reducing the bit rate is the degradation in speaker recognition performance. In this paper, we present a nontraditional solution for the previously mentioned problem. To reduce the bit rate, a speech signal is segmented at the client, and the most effective phonemes (determined according to their type and frequency) for speaker recognition are selected and sent to the server. Speaker recognition occurs at the server. Applying this approach to YOHO corpus, we achieved an identification error rate (ER) of 0.05% using an average segment of 20.4% for a testing utterance in a speaker identification task. We also achieved an equal error rate (EER) of 0.42% using an average segment of 15.1% for a testing utterance in a speaker verification task.

Key Words: distributed speaker recognition, speaker recognition, speaker identification, speaker verification, wireless communications


Manuscript received May 2, 2005. Manuscript revised October 21, 2005.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.