Copyright © 2007 The Institute of Electronics, Information and Communication Engineers
Regular Section -- Papers -- Speech and Hearing |
Assessment of On-Line Model Quality and Threshold Estimation in Speaker Verification
1 The author is with Biometric Technologies, S.L., Barcelona, 08007 Spain., 2 The author is with TALP Research Center (UPC), Barcelona, 08034 Spain. E-mail: javier{at}gps.tsc.upc.es
The selection of the most representative utterances coming from a speaker is essential for the right performance of automatic enrollment in speaker verification. Model quality measures and threshold estimation methods mainly deal with the scarcity of data and the difficulty of obtaining data from impostors in real applications. Conventional methods estimate the quality of the training utterances once the model is created. In such case, it is not possible to ask the user for more utterances during the training session if necessary. A new training session must be started. That was especially unusable in applications where only one or two enrolment sessions were allowed. In this paper, a new on-line quality method based on a male and a female Universal Background Model (UBM) is introduced. The two models act as a reference for new utterances and show if they belong to the same speaker and provide a measure of its quality at the same time. On the other hand, the estimation of the verification threshold is also strongly influenced by the previous selection of the speaker's utterances. In this context, potential outliers, i.e., those client scores which are distant with regard to mean, could lead to wrong mean and variance client estimations. To alleviate this problem, some efficient threshold estimation methods based on removing or weighting scores are proposed here. Before estimating the threshold, the client scores catalogued as outliers are removed, pruned or weighted, improving subsequent estimations. Text-dependent experiments have been carried out by using a telephonic multi-session database in Spanish. The database has been recorded by the authors and has 184 speakers.
Key Words: speaker verification, threshold, quality, model estimation, pruning
Manuscript received January 17, 2005. Manuscript revised June 3, 2005.
References
[1] O. Kimball, M. Schmidt, H. Gish, and J. Waterman, "Speaker verification with limited enrollment data," Proc. Eurospeech'97, pp.967970, 1997.
[2] Y. Gu, H. Jongebloed, D. Iskra, E. Os, and L. Boves, "Speaker verification in operational environments-monitoring for improved service operation," ICSLP'00, vol.II, pp.450453, Beijing, 2000.
[3] J. Koolwaaij, L. Boves, E. den, Os, and H. Jongebloed, "On model quality and evaluation in speaker verification," ICASSP'00, pp.37593762, Istanbul, 2000.
[4] J.R. Saeta and J. Hernando, "Model quality evaluation during enrollment for speaker verification," 8th International Conference on Spoken Language Processing (ICSLP), pp.352355, Jeju, South Korea, 2004.
[5] J.R. Saeta and J. Hernando, "On the use of score pruning in speaker verification for speaker dependent threshold estimation," A Speaker Odyssey, The Speaker Recognition Workshop, pp.215218, Toledo, Spain, 2004.
[6] K. Chen, "Towards better making a decision in speaker verification," Pattern Recognit., 36, pp.329346, 2003.
[7] S. Furui, "Cepstral analysis for automatic speaker verification," IEEE Trans. Speech Audio Process., vol.29, no.2, pp.254272, 1981.
[8] J.B. Pierrot, J. Lindberg, J. Koolwaaij, H.P. Hutter, D. Genoud, M. Blomberg, and F. Bimbot, "A comparison of a priori threshold setting procedures for speaker verification in the CAVE project," Proc. ICASSP'98, pp.125128.
[9] J. Lindberg, J. Koolwaaij, H.P. Hutter, D. Genoud, J.B. Pierrot, M. Blomberg, and F. Bimbot, "Techniques for a priori decision threshold estimation in speaker verification," Proc. RLA2C, pp.8992, Avignon, 1998.
[10] J.R. Saeta and J. Hernando, "Automatic estimation of a priori speaker dependent thresholds in speaker verification," Proc. 4th International Conference in Audio- and Video-based Biometric Person Authentication (AVBPA), pp.7077, 2003.
[11] N. Mirghafori and L. Heck, "An adaptive speaker verification system with speaker dependent a priori decision thresholds," Proc. ICSLP'02, pp.589592, 2002.
[12] G. Gravier and G. Chollet, "Comparison of normalization techniques for speaker verification," Proc. RLA2C, pp.97100, Avignon, 1998.
[13] D.A. Reynolds, "Comparison of background normalization methods for text-independent speaker verification," Proc. Eurospeech'97, pp.963966, 1997.
[14] W.D. Zhang, K.K. Yiu, M.W. Mak, C.K. Li, and M.X. He, "A priori threshold determination for phrase-prompted speaker verification," Proc. Eurospeech'99, pp.12031206, 1999.
[15] A.C. Surendran and C.H. Lee, "A priori threshold selection for fixed vocabulary speaker verification systems," Proc. ICSLP'00, vol.II, pp.246249, 2000.
[16] F. Bimbot and D. Genoud, "Likelihood ratio adjustment for the compensation of model mismatch in speaker verification," Proc. Eurospeech'97, pp.13871390, 1997.
[17] Q. Li, B.H. Juang, Q. Zhou, and C.H. Lee, "Verbal information verification," Proc. Eurospeech'97, pp.839842, 1997.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||