Skip Navigation

IEICE Transactions on Information and Systems 2007 E90-D(4):759-765; doi:10.1093/ietisy/e90-d.4.759
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by SAETA, J. R.
Right arrow Articles by HERNANDO, J.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Copyright © 2007 The Institute of Electronics, Information and Communication Engineers

Regular Section -- Papers -- Speech and Hearing

Assessment of On-Line Model Quality and Threshold Estimation in Speaker Verification

Javier R. SAETA1 and Javier HERNANDO2

1 The author is with Biometric Technologies, S.L., Barcelona, 08007 Spain., 2 The author is with TALP Research Center (UPC), Barcelona, 08034 Spain. E-mail: javier{at}gps.tsc.upc.es

The selection of the most representative utterances coming from a speaker is essential for the right performance of automatic enrollment in speaker verification. Model quality measures and threshold estimation methods mainly deal with the scarcity of data and the difficulty of obtaining data from impostors in real applications. Conventional methods estimate the quality of the training utterances once the model is created. In such case, it is not possible to ask the user for more utterances during the training session if necessary. A new training session must be started. That was especially unusable in applications where only one or two enrolment sessions were allowed. In this paper, a new on-line quality method based on a male and a female Universal Background Model (UBM) is introduced. The two models act as a reference for new utterances and show if they belong to the same speaker and provide a measure of its quality at the same time. On the other hand, the estimation of the verification threshold is also strongly influenced by the previous selection of the speaker's utterances. In this context, potential outliers, i.e., those client scores which are distant with regard to mean, could lead to wrong mean and variance client estimations. To alleviate this problem, some efficient threshold estimation methods based on removing or weighting scores are proposed here. Before estimating the threshold, the client scores catalogued as outliers are removed, pruned or weighted, improving subsequent estimations. Text-dependent experiments have been carried out by using a telephonic multi-session database in Spanish. The database has been recorded by the authors and has 184 speakers.

Key Words: speaker verification, threshold, quality, model estimation, pruning


Manuscript received January 17, 2005. Manuscript revised June 3, 2005.

References

[1] O. Kimball, M. Schmidt, H. Gish, and J. Waterman, "Speaker verification with limited enrollment data," Proc. Eurospeech'97, pp.967–970, 1997.

[2] Y. Gu, H. Jongebloed, D. Iskra, E. Os, and L. Boves, "Speaker verification in operational environments-monitoring for improved service operation," ICSLP'00, vol.II, pp.450–453, Beijing, 2000.

[3] J. Koolwaaij, L. Boves, E. den, Os, and H. Jongebloed, "On model quality and evaluation in speaker verification," ICASSP'00, pp.3759–3762, Istanbul, 2000.

[4] J.R. Saeta and J. Hernando, "Model quality evaluation during enrollment for speaker verification," 8th International Conference on Spoken Language Processing (ICSLP), pp.352–355, Jeju, South Korea, 2004.

[5] J.R. Saeta and J. Hernando, "On the use of score pruning in speaker verification for speaker dependent threshold estimation," A Speaker Odyssey, The Speaker Recognition Workshop, pp.215–218, Toledo, Spain, 2004.

[6] K. Chen, "Towards better making a decision in speaker verification," Pattern Recognit., 36, pp.329–346, 2003.

[7] S. Furui, "Cepstral analysis for automatic speaker verification," IEEE Trans. Speech Audio Process., vol.29, no.2, pp.254–272, 1981.

[8] J.B. Pierrot, J. Lindberg, J. Koolwaaij, H.P. Hutter, D. Genoud, M. Blomberg, and F. Bimbot, "A comparison of a priori threshold setting procedures for speaker verification in the CAVE project," Proc. ICASSP'98, pp.125–128.

[9] J. Lindberg, J. Koolwaaij, H.P. Hutter, D. Genoud, J.B. Pierrot, M. Blomberg, and F. Bimbot, "Techniques for a priori decision threshold estimation in speaker verification," Proc. RLA2C, pp.89–92, Avignon, 1998.

[10] J.R. Saeta and J. Hernando, "Automatic estimation of a priori speaker dependent thresholds in speaker verification," Proc. 4th International Conference in Audio- and Video-based Biometric Person Authentication (AVBPA), pp.70–77, 2003.

[11] N. Mirghafori and L. Heck, "An adaptive speaker verification system with speaker dependent a priori decision thresholds," Proc. ICSLP'02, pp.589–592, 2002.

[12] G. Gravier and G. Chollet, "Comparison of normalization techniques for speaker verification," Proc. RLA2C, pp.97–100, Avignon, 1998.

[13] D.A. Reynolds, "Comparison of background normalization methods for text-independent speaker verification," Proc. Eurospeech'97, pp.963–966, 1997.

[14] W.D. Zhang, K.K. Yiu, M.W. Mak, C.K. Li, and M.X. He, "A priori threshold determination for phrase-prompted speaker verification," Proc. Eurospeech'99, pp.1203–1206, 1999.

[15] A.C. Surendran and C.H. Lee, "A priori threshold selection for fixed vocabulary speaker verification systems," Proc. ICSLP'00, vol.II, pp.246–249, 2000.

[16] F. Bimbot and D. Genoud, "Likelihood ratio adjustment for the compensation of model mismatch in speaker verification," Proc. Eurospeech'97, pp.1387–1390, 1997.

[17] Q. Li, B.H. Juang, Q. Zhou, and C.H. Lee, "Verbal information verification," Proc. Eurospeech'97, pp.839–842, 1997.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by SAETA, J. R.
Right arrow Articles by HERNANDO, J.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?