How to Deal with Multiple-Targets in Speaker Identification Systems?

Similar documents
ISCA Archive

Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems

SCORE CALIBRATING FOR SPEAKER RECOGNITION BASED ON SUPPORT VECTOR MACHINES AND GAUSSIAN MIXTURE MODELS

Support Vector Machines using GMM Supervectors for Speaker Verification

A Generative Model Based Kernel for SVM Classification in Multimedia Applications

Front-End Factor Analysis For Speaker Verification

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification

Speaker Verification Using Accumulative Vectors with Support Vector Machines

speaker recognition using gmm-ubm semester project presentation

Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data

Usually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However,

Gain Compensation for Fast I-Vector Extraction over Short Duration

GMM Vector Quantization on the Modeling of DHMM for Arabic Isolated Word Recognition System

University of Birmingham Research Archive

Mixtures of Gaussians with Sparse Structure

Around the Speaker De-Identification (Speaker diarization for de-identification ++) Itshak Lapidot Moez Ajili Jean-Francois Bonastre

A Small Footprint i-vector Extractor

Modified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition System

Robust Speaker Identification

Exemplar-based voice conversion using non-negative spectrogram deconvolution

Symmetric Distortion Measure for Speaker Recognition

Speaker recognition by means of Deep Belief Networks

Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-vectors

IDIAP. Martigny - Valais - Suisse ADJUSTMENT FOR THE COMPENSATION OF MODEL MISMATCH IN SPEAKER VERIFICATION. Frederic BIMBOT + Dominique GENOUD *

Glottal Modeling and Closed-Phase Analysis for Speaker Recognition

Studies on Model Distance Normalization Approach in Text-independent Speaker Verification

IBM Research Report. Training Universal Background Models for Speaker Recognition

An Investigation of Spectral Subband Centroids for Speaker Authentication

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS

An Integration of Random Subspace Sampling and Fishervoice for Speaker Verification

Monaural speech separation using source-adapted models

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Joint Factor Analysis for Speaker Verification

TNO SRE-2008: Calibration over all trials and side-information

Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition

A Generative Model for Score Normalization in Speaker Recognition

Speaker Recognition Using Artificial Neural Networks: RBFNNs vs. EBFNNs

Uncertainty Modeling without Subspace Methods for Text-Dependent Speaker Recognition

Multiclass Discriminative Training of i-vector Language Recognition

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers

Modeling acoustic correlations by factor analysis

Allpass Modeling of LP Residual for Speaker Recognition

I D I A P R E S E A R C H R E P O R T. Samy Bengio a. November submitted for publication

An Evolutionary Programming Based Algorithm for HMM training

Harmonic Structure Transform for Speaker Recognition

Unifying Probabilistic Linear Discriminant Analysis Variants in Biometric Authentication

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Using Deep Belief Networks for Vector-Based Speaker Recognition

GMM-Based Speech Transformation Systems under Data Reduction

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models

Detection theory 101 ELEC-E5410 Signal Processing for Communications

On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System

Augmented Statistical Models for Speech Recognition

Biometrics: Introduction and Examples. Raymond Veldhuis

Towards Duration Invariance of i-vector-based Adaptive Score Normalization

If you wish to cite this paper, please use the following reference:

Automatic Speech Recognition (CS753)

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator

Session Variability Compensation in Automatic Speaker Recognition

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme

Pattern Recognition Problem. Pattern Recognition Problems. Pattern Recognition Problems. Pattern Recognition: OCR. Pattern Recognition Books

Kernel Based Text-Independnent Speaker Verification

BIOMETRIC verification systems are used to verify the

Multimodal Biometric Fusion Joint Typist (Keystroke) and Speaker Verification

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

Discriminative training of GMM-HMM acoustic model by RPCL type Bayesian Ying-Yang harmony learning

Presented By: Omer Shmueli and Sivan Niv

PATTERN RECOGNITION AND MACHINE LEARNING

Model-based unsupervised segmentation of birdcalls from field recordings

A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information

The speaker partitioning problem

SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS

FoCal Multi-class: Toolkit for Evaluation, Fusion and Calibration of Multi-class Recognition Scores Tutorial and User Manual

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models

Statistical NLP Spring The Noisy Channel Model

Revisiting Doddington s Zoo: A Systematic Method to Assess User-dependent Variabilities

Text Independent Speaker Identification Using Imfcc Integrated With Ica

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

VoI for Learning and Inference! in Sensor Networks!

Gaussian Mixture Model Uncertainty Learning (GMMUL) Version 1.0 User Guide

EFFECTIVE ACOUSTIC MODELING FOR ROBUST SPEAKER RECOGNITION. Taufiq Hasan Al Banna

Comparing linear and non-linear transformation of speech

Improved Method for Epoch Extraction in High Pass Filtered Speech

TinySR. Peter Schmidt-Nielsen. August 27, 2014

Diagonal Priors for Full Covariance Speech Recognition

Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data

Performance Evaluation and Comparison

Fast speaker diarization based on binary keys. Xavier Anguera and Jean François Bonastre

Pattern Classification

Introduction to Statistical Inference

Minimax i-vector extractor for short duration speaker verification

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

Gaussian Mixture Models for On-line Signature Verification

Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks

Learning Methods for Linear Detectors

Analysis of mutual duration and noise effects in speaker recognition: benefits of condition-matched cohort selection in score normalization

Robust Sound Event Detection in Continuous Audio Environments

Unsupervised Anomaly Detection for High Dimensional Data

Transcription:

How to Deal with Multiple-Targets in Speaker Identification Systems? Yaniv Zigel and Moshe Wasserblat ICE Systems Ltd., Audio Analysis Group, P.O.B. 690 Ra anana 4307, Israel yanivz@nice.com Abstract In open-set speaker identification systems a known phenomenon is that the false alarm (accept) error rate increases dramatically when increasing the number of registered speakers (models). In this paper, we demonstrate this phenomenon and suggest a solution using a new modeldependent score-normalization technique, called Top-norm. The Top-norm method was specifically developed to improve results of open-set speaker identification systems. Also, we suggest a score-normalization parameter adaptation technique. Experiments performed using speaker recognition corpora are described and demonstrate that the new method outperforms other normalization methods.. Introduction Open-set speaker identification or multi-target detection [] is the most challenging type of speaker recognition. It aims to determine if an input utterance (test) is from a specified speaker (target) which is modeled in a target-set (models, stack, or watch-list). The task is open-set since the test can be from non-target speakers (unknown speakers). Open-set speaker identification has applications to the audio database search task for recorded telephone interactions, recorded meetings, or other historical audio documents, where a set of speakers (targets) are searched. Open-set identification systems suffer from the false alarm (FA) error rate problem. The FA increases dramatically with increasing stack size. This problem does not exist in verification systems. In order to deal with this FA problem, a new speaker-dependent (model-dependent) scorenormalization called Top-norm is presented. In this normalization method, only the top-impostor- which produce the FA errors in a low-fa tuned verification system are to be considered in the normalization process. In speaker recognition systems based on stochastic models (such as GMM) likelihood are very sensitive to variations in text, speaking behavior, and recording conditions []. Score normalization has been introduced explicitly to cope with score variability and to make speakerindependent decision threshold tuning easier [3]. It has been shown that score normalization is very effective in improving open-set speaker identification system performances [4] as well as in speaker verification systems [], []. In this paper, the open-set speaker identification problem is discussed and emphasized using some simulated results. We show that significantly better identification results can be achieved, theoretically, when decreasing the standard deviation of the non-target score distribution in the prototype verification system, even when the equal error rate (EER) is kept. Open-set speaker identification experiments using speaker recognition databases are presented and the proposed Top-norm score normalization method is compared with other methods such as world model normalization (WM), Z-norm, unconstrained cohort normalization (UC), and T-norm.. The False Accept Error Problem of Identification Systems Given a set of known speakers (target-set) and a sample utterance of an unknown speaker (test), open-set speaker identification (OSI) is defined as a twofold problem [4]: ) Identifying the speaker model in the set, which best matches the test utterance. ) Decision whether the test utterance has actually been produced by the speaker associated with the best-matched model (target), or by unknown speaker outside the target-set (non-target). This decision is made using a * threshold operated on the maximum score s produced by the target-set models. This open-set identification can be seen also as multi-target detector [], where each target-model in the test phase is operated as a single target detector. Suppose that M speakers (targets) are enrolled in the system and their statistical models are λ, λ,..., λ M. If O denotes the feature vectors sequence extracted from the test utterance then the open-set identification can be stated as follows: { ( )} * τ arg max s O λm s max { s( O λ )} m M m O m M < τ unknown speaker O m is a probabilistic score of O given a target model λ m. Three types of errors exist in an open-set identification where τ is a pre-determined threshold and s( λ ) system: ) FA error - occurs when the maximum score s * is above threshold given that the test is of a non-target speaker. ) False reject error (FR or miss) - occurs for a target test * when the maximum score s is below threshold. 3) Confusion error - occurs for a target test when the maximum * score s is above threshold; however the model that yields the maximum score is not belong to the input tested speaker. We may consider the case of group detector (or top-s stack detector []), which is the case where we only want to know if the tested input speaker belongs to the target-set or not; the exact identity of the speaker is not necessary. Thus, the confusion error does not exist. The overlapping between the distributions of the nontarget and of the target in open-set identification is greater than the overlapping of impostor and target in speaker verification. This is because of the maximum score selection (between the models ) in the

identification process; the bigger the target-set size (M), the greater is this overlapping. In order to emphasize the problem of open-set identification versus verification, a simulated test was performed using equations of predicted miss and false alarm probabilities (), (). These equations were derived for the group detector (of size M) probabilities of miss Pˆmiss ( τ ) and false alarm (false accept) Pˆfa ( τ ) using the miss Pmiss ( τ ) and false alarm Pfa ( τ ) probabilities (τ - threshold) of the prototype verification system (single detector). For these equations, we assume that the (M) detectors operate independently of each other, that they all have the same miss and false alarm probabilities, and that the prior probabilities of the target classes are equal [] ˆ ( ) ( M Pfa τ Pfa ( τ) ) M Pˆ ( τ) P ( τ) P ( τ) () ( ) () miss miss fa In this test, the target and the impostor of the prototype single target detector (verification system) were simulated using Gaussian distributions. Fig. shows the simulated score histograms of the first simulated experiment. The mean of the target ( T ) is, the mean of the nontarget (impostors) ( nt ) is -3.9, and the standard deviation of both ( T, nt ) is.. Fig. shows the detection error trade-off (DET) curves of the predicted identification system using equations () and () calculated from the distribution of Fig. for different stack (target-set) sizes ( M,,...,0,0,...,00,00,...,000 ). From Fig. one can see that the stack size has a major influence on the group detector performances. For an equal error rate (EER) of we get EER of.8 for stack size of 0 and EER of 3.9 for a stack size of 00. In many applications, these error rates are too large to work with; especially if we have very large stack sizes (over 00). Fig. 3 shows the group detector miss and false alarm probabilities for stack sizes M,,...,000 for a multitarget detector composed of prototypes (single-detectors) operating at Pmiss ( τ) 0., Pfa ( τ) 0.007. From this figure one can see that the false alarm rate increases dramatically with increasing stack size. These ( Pmiss ( τ ), Pfa ( τ )) points can be seen also by the circles in Fig.. As has been observed also by others [] [6], we need single detectors (verification systems) that produce very low FA rates in order to have a reasonable FA rate when combining these single-detectors into multi-target (identification) systems. Fig. 4 shows another example of simulated score histograms. This time, the standard deviation of the non-target score distribution ( nt 0.8 ) is less than the standard deviation of the target score distribution ( T. ). The mean of the non-target score distribution was chosen to be nt.76 in order to have the same EER for the verification results as in the previous simulated example (EER ). Fig. shows the DET curves of the predicted identification system using equations () and () calculated from the distribution of Fig. 4 for different stack (target-set) sizes ( M,,...,0,0,...,00,00,...,000 ). From this figure one can see that significantly better identification results are achieved as compared to the previous example (Fig. ), especially in the low FA areas. For example, for M 00, the EER of this system is 4.8 as compared to 3.9 in the previous one, although both prototype verification systems have the same EER value (). This phenomenon is caused by the fact that when decreasing the standard deviation of the non-target score distribution, the DET curve is tilted counterclockwise. The width of the non-target score distribution right tail is narrower, which causes lower FA rates for a given FR rate (in the FR > EER area). Hence identification systems, whose DET curve points can be seen as transformed from the lower FA area points of the prototype verification system (see the circles in Fig. ), perform better. Score normalization techniques may cause this kind of tilt in the prototype verification DET curve [7]. In order to achieve this effect, we choose to use a score normalization method which is presented next. Specific Detection Errors 0.3 0.3 0. 0. 0. 0. 0.0 impostors target 0-0 -8-6 -4-0 4 6 8 score Figure : Histograms of the first simulated test. Miss probability (in ) 0.8 0.6 0.4 0. 0 0 0. 0. 0. 0. 0. 0. 0 0 False Alarm probability (in ) Figure : DET curves of the first simulated test. Pmiss and Pfa vs. M Pfa M 000 M 00 M 0 M M Pmiss 0 0 00 00 300 0 0 600 700 800 900 000 M Figure 3: The group detector miss and false alarm probabilities vs. stack size for a group detector composed of P τ 0., P τ 0.007. prototypes operating at ( ) ( ) miss fa

0. 0.4 0.3 0. 0. impostor target 0-0 -8-6 -4-0 4 6 8 score Figure 4: Histograms of the second simulated test. Miss probability (in ) 0 0 0. 0. 0. 0. 0. 0. 0 0 False Alarm probability (in ) M 000 M 00 M 0 M Figure : DET curves of the second simulated test. 3. The Top-orm Method Even after world model normalization (WM) the score distribution is different for each speaker model (target). This causes relatively high FA errors for some targets and relatively high FR (miss) errors for other targets. Fig. 7 (left column) demonstrates this phenomenon; it shows an example of impostor score distributions of 0 different speakers (models), where the have WM only. To compensate for this phenomenon, Z-norm score normalization method [8] has been proposed and used. In this method speakerdependent mean and variance (normalization parameters) are estimated from the speaker- (model-) dependent impostor (non-target)-score distribution (see impostor Z-norm score distribution on Fig. 7 middle column). However, this method assumes that this distribution for each model is Gaussian. Practically, it is not accurate. Most often the right tail may be considered Gaussian, but not the overall distribution. This is mainly (but not only) because of the existence of different channels and/or handset devices and, in some cases, both male and female trials. To deal with this phenomenon and still to perform speaker-dependent scorenormalization, a new score-normalization technique called Top-norm is presented. In the Top-norm method, only the top-impostor- which produce the FA errors in a low-fa tuned verification system are to be considered. Top-norm is similar to Z-norm, in means of calculating two parameters for each targetspeaker: mean and standard-deviation, however, these parameters are not to be calculated from all the impostor, only from the top- (see Fig. 6). Hence, the normalized score is s n s (3) where and are the top- mean and standarddeviation respectively, calculated from a pre-defined percent value (p, toppercent) of the top-. Fig. 7 (right column) shows impostor score distributions of the different speakers as in the left columns, however the are normalized using the top-norm method (toppercent 0). From this figure one can see that all distributions are right-aligned, causing equally dispersed top- for all the target-speakers (models). Fig. 8 shows DET curves of these three normalization methods (WM, Z-norm, and Topnorm) combined in a verification system (database description in the next chapter). From this figure one can see that although all these three methods have almost the same EER, Top-norm is superior to the others in the low FA area ( Pfa ( τ ) < 0. ). As investigated in chapter, this implies better identification results using large stack sizes. Score distribution on-target Z-norm parameters τ Top Top-norm parameters s ( O) Figure 6: Illustration of the parameters of the Top-norm. Figure 7: Speaker-dependent non-target score distributions for three normalization methods: ) no speaker-dependent normalization (WM only), ) Z-norm, and 3) Top-norm. Miss probability (in ) 80 60 0 0 Detection Error Trade-off (DET) curve WM only z-norm top-norm 0. 0. 0. 0.0 0.0 0.0 0.0 0.00.0.0.0. 0 0 60 80 False Alarm probability (in ) Figure 8: The DET curves of the verification system using: ) WM, ) Z-norm, and 3) top-norm.

4. Adaptation of ormalization Parameters In some speaker identification applications, such as fraudsterdetection, most of the tested speakers are non-targets (nonfraudsters; not belong to the model-list), and hence produce non-target. We can exploit these (collected during test) non-target in order to better estimate the normalization parameters that are initially estimated in the training stage. This can be done using some adaptation techniques. The idea behind the normalization-parameters adaptation is that we can estimate mean and variance (, and ) of any score PDF using the previous values of the mean and variance ( and ) and the current th score value, s, using the equations: ( ) + s (4) + ( s ) ext we propose normalization-parameters adaptation technique for Z-norm and for Top-norm methods. 4.. Z-norm Parameter Adaptation Algorithm For this method, for each model, we should store the parameters:,, and. The basic algorithm for adapting Z-norm parameters:. Initialization Step (using initial i ) i sn n ( ) E s E s sn n. Adaptation iteration using new score ( s ) + + ( s ) ( ) + s 4.. Top-norm Parameter Adaptation Algorithm Adapting Top-norm parameters is more difficult task than adapting Z-norm parameters. In the suggested Top-norm adaptation algorithm, we need to store, for each model, two sets of distribution parameters (,, and ): ) the general score distribution parameters (similar to Z-norm), ) the top p (toppercent) score distribution parameters (,, and ). The basic algorithm for adapting Top-norm parameters:. Initialization Step (using initial i ) i () sn n ( ) E s E s sn n p 00 ; sn sn s n { } (top p of ) s n sn s n ( ) ; { }. Adaptation iteration using new score ( s ) + + ( s ) ( ) + s Calculate Top-norm score threshold τ if > p & τ 00 < τ τ τ + α if < p & τ 00 > τ τ τ α if s τ + ( ) s + else ( ) + ( s) Threshold correction Where the Top-norm threshold, τ, indicates the (p) toppercentage score-threshold (see figure 6). Since we do not store each one of the (test), we can not accurately calculate this threshold, we can only estimate it. We used the threshold estimation equation: + τ (7) + And in each iteration step, a threshold correction procedure was made in order to adjust this threshold to the required ratio (6). The parameter α is an adaptation coefficient used for this threshold correction (e.g. α 0.000 ). (6)

. Experiments and Results Experiments for open-set speaker identification were performed using GMM-based system. 4-features were extracted from each 0msec frame ( overlapping). The features were Mel-frequency-cepstral-coefficients (MFCC) and MFCC. Cepstral mean subtraction (CMS) was added. The speech data used was part of IST99 speaker evaluation (SPK) and SwitchBoard (SB; release, phase ) databases, which consist of one-sided telephone conversations. In order to make a speaker identification test, 83 speakers from IST99 were chosen to be the target-set, these speakers (males and females) have more than six files. 83 (-mixture) GMMs were trained; two minutes from two files. The identification test included a total of 33 files (trials) from over 700 speakers (females and males) which included 984 target files and 449 non-target files. For estimating Z-norm and Top-norm parameters, 000 additional SPK files from SB database were used. The baseline system was normalized using gender-dependent WM; these models (Two 6-mixture GMMs) were trained using the IST98 database. In many speaker identification applications, such as fraud detection, very low FA error rate is required and the EER point is less important; therefore, in the next reported performance results we will relate also to the FA error value, P fa, in which the false reject error in this working point is equal to. Fig. 9 shows curves of the EER and the false fa accept ( P ) versus toppercent parameter (p) of the Topnorm method, combined in OSI system (M 83). From this figure one can see that a good compromise between optimal EER and optimal P fa is the choice of 0 for the value of p. Figure 0 shows the Detection Error Trade-off (DET) curve of this open-set speaker identification system, using the mentioned databases and different target-set sizes:, 0, 0,, 00, and 83. Other normalization methods were implemented and tested for comparison with the Top-norm method; among them: baseline (WM only), Z-norm, T-norm [], and unconstraint cohort normalization (UC) [3]. Fig. shows DET curves of the open-set speaker identification system (group detector; M 83) using different normalization methods: ) baseline (WM), ) Z-norm, 3) UC (C ), and 4) Top-norm (p 0). Cohort size (C) of was chosen for the UC because it was the optimal value. From this figure one can see that Top-norm is superior to other methods at all working points. We may see also the counterclockwise tilt of the UC DET curve relatively to the other DET curves. This may suggest a useful combination of the Topnorm and UC methods. ote that Z-norm performs worse than the baseline (WM) system. This is because the Z-norm parameters were estimated using gender-independent utterances; however, when we used gender-dependent utterances, Z-norm performances were better than the baseline system, but not better than the Top-norm system. T- norm (not shown here) was tested as well and yielded poorer results than UC, probably because its sensitivity to mixedgender model population. False Accept Error [] for fixed FR EER [] 0 8 6 4 0 0 0 30 60 70 80 90 00 toppercent [] 30 0 Figure 9: no norm. (only WM) no norm (only WM) 0 0 0 30 60 70 80 90 00 toppercent [] False Reject Error [] 0. P fa and EER versus toppercent parameter of the Top-norm method. 0 0 Detection Error Trade-off (DET) curve M 83 M 00 M M 0 M 0 M 0. 0. 0. 0 0 False Accept Error [] Figure 0: The DET curve of the open-set speaker identification system using top-norm method, and /0/0//00/83 models (target-set size). False Reject Error [] 0 0 0. Detection Error Trade-off (DET) curve no norm (WM only) z-norm UC top-norm 0. 0. 0. 0 0 False Accept Error [] Figure : The DET curve of the OSI system (M 83) using different normalization methods: ) WM, ) Z-norm, 3) UC, and 4) Top-norm.

Table shows the EER and P fa for the different normalization methods and for different target-set sizes. The identification test for M83 included 984 target trials and 449 non-target trials. For smaller M, cross-validation tests were performed; hence, target trials and non-target trials were increased. From this table we can see that UC is useful only above 0 models (stack size) and only in the low FA working points (FR ). Moreover, we can see from this table that Top-norm method is superior to the other methods in each of the tested target-set sizes. Table : P fa and EER of different normalization methods. Target-set size (M) Method 0 0 00 83 WM P fa () 0.09.3.63 3.4 4.7 6.9 (baseline system) EER () 6.09 3. 6. 9.8.9 4. UC (C) Top-norm (p0) Fig. shows DET curves of the open-set speaker identification system (M 83) using Top-norm method (p 0): ) without parameter adaptation (dashed line), ) with parameter adaptation (solid line). The parameter adaptation was performed as in section 4. using i 0 and α 0.000. From this figure one can see that Top-norm parameter adaptation causes better performances in the desired working points (low FA rates). False Reject Error [] 0 0 0. P fa ().8.6..36.8 EER () 9 7.8.. 4. P fa () 0.03 0.8 0...7.33 EER ().8 3.4 4.8 6. 7.4 8.7 Detection Error Trade-off (DET) curve M 83 top-norm (no adaptation) top-norm (with adaptation) 0. 0. 0. 0 0 False Accept Error [] Figure : The DET curve of the OSI system (M 83) using Top-norm method: ) without parameter adaptation, ) with parameter adaptation. 6. Conclusions In this paper we presented and demonstrated the false accept error problem of identification systems, which is the phenomenon in which the false alarm rate increases dramatically when increasing the number of registered speakers (target-set size). We showed that significantly better identification results can be achieved, when decreasing the standard deviation of the non-target score distribution in the prototype verification system, even when the equal error rate (EER) is kept. We suggested a new model-dependent scorenormalization method, called Top-norm. The Top-norm was developed especially for open-set speaker identification systems. It was shown that Top-norm is superior to other normalization methods. It is also faster than test-dependent normalization methods (such as UC and T-norm) and it is not sensitive to mixed-gender population (such as Z-norm and T-norm). One drawback is the need for relatively many nontarget files to estimate the Top-norm parameters, since we deal with the right-tail of the non-target score-distribution (the top-). However, this is not really a problem, since we can adapt the Top-norm parameters during the test phase. We also suggested an algorithm for Z-norm and Top-norm parameter adaptation. Although Top-norm method was developed especially for speaker identification, it can be used also in speaker verification systems. Combination of Topnorm with test-dependent score normalization method (such as T-norm or UC) can yield better performances. 7. References [] E. Singer and D. Reynolds, Analysis of Multitarget Detection for Speaker and Language Recognition, Proc. Odyssey 004 The Speaker and Language Recognition Workshop, Toledo, 004. [] Y. Zigel and A. Cohen, "On Cohort Selection for Speaker Verification," Proceedings of Eurospeech 003, pp. 977-980, Geneva, 003. [3] F. Bimbot, J-F Bonastre, C. Fredouille, G. Gravier, I. Magrin-Chagnolleau, S. Meignier, T. Merlin, J. Ortega- Garcia, D. Petrovska-Delacretaz, and D. A. Reynolds, A Tutorial on Text-Independent Speaker Verification, EURASIP Journal on Applied Signal Processing, vol. 4, pp. 430-4, 004. [4] J. Fortuna, P. Sivakumaran, A. M. Ariyaeeinia, and A. Malegaonkar, Relative Effectiveness of Score ormalization Methods in Open-Set Speaker Identification, Proc. Odyssey 004 The Speaker and Language Recognition Workshop, Toledo, 004. [] R. Auckenthaler, M. Carey, and H Lloyd-Thomas, Score ormalization for Text-Independent Speaker Verification Systems, Digital Signal Processing, vol. 0, pp. 4-4, 000. [6] J. Daugman, Biometric Decision Landscapes, Technical Report o. TR48, University of Cambridge Computer Lab. http://www.cl.cam.ac.uk/users/jgd000. [7] J. avratil, G.. Ramaswamy, The awe and mystery of T-orm, Proc. of EUROSPEECH-03, Geneva, 003. [8] D. A. Reynolds, Comparison of Background ormalization Methods for Text-Independent Speaker Verification, in Proceedings of the Eurospeech 97, pp. 963-966, 997. [9] Y. Zigel and A. Cohen Text-Dependent Speaker Verification Using Feature Selection with Recognition Related Criterion, Proc. Odyssey 004 The Speaker and Language Recognition Workshop, pp. 39-336, Toledo, 004. [0] J. M. Colombi, J. S. Reider, and J. P. Campbell, Allowing Good Impostors to Test, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems & Computers, 997, pp. 96-300, 998.

[] G.. Ramaswamy, J. avratil, U. V. Chaudhari, R. D. Zilca, The IBM System for the IST 00 Cellular Speaker Verification Evaluation," ICASSP-003, Hong Kong, April, 003.