Speaker Recognition Using Artificial Neural Networks: RBFNNs vs. EBFNNs
|
|
- Clarissa Page
- 6 years ago
- Views:
Transcription
1 Speaer Recognition Using Artificial Neural Networs: s vs. s BALASKA Nawel ember of the Sstems & Control Research Group within the LRES Lab., Universit 20 Août 55 of Sida, BP: 26, Sida, 21000, Algeria nabalasa@ahoo.fr AHIDA Zahir Director of the Sstems & Control Research Group within the LRES Lab., Universit 20 Août 55 of Sida, BP: 26, Sida, 21000, Algeria zahirahmida@ahoo.fr GOUTAS Ahcène Director of the Signal & Image Processing Research Group within the LRES Lab., Universit 20 Août 55 of Sida, BP: 26, Sida, 21000, Algeria a.goutas@ahoo.fr Abstract- This paper deals with the application of Radial Basis Function Neural Networs (s) and Elliptical Basis Function Neural Networs (s) for textindependent speaer recognition experiments. These include both closed-set and open-set speaer identification and speaer verification. The database used is a subset of the TIIT database consisting of 60 speaers from different dialect regions. LP-derived Cepstral Coefficients (LPCC) are used as the speaer specific features. Simulation results show that outperform for several speaer recognition experiments. Kewords- Speaer recognition, speaer identification, speaer verification, Radial Basis Function Neural Networs (s), Elliptical Basis Function Neural Networs (s). I. INTRODUCTION In general speaer recognition sstems fall into two main categories, namel: speaer identification sstems and speaer verification sstems. In speaer identification, the goal is to identif an unnown voice from a set of nown voices. Whereas, the objective of speaer verification is to verif whether an unnown voice matches the voice of a speaer whose identit is being claimed [1], [2] and [3]. Speaer identification sstems are mainl used in criminal investigation while speaer verification sstems are used in securit access control. A generic speaer recognition sstem is shown in Fig. 1 Speech Signal Feature Extraction Classification Fig. 1. Speaer recognition sstem Speaer Ident. or Speaer Verif. In Fig. 1, the desired features are first extracted from the speech signal [4], [5]. The extracted features are then used as inputs to a classifier, which maes the final decision regarding identification or verification. Speaer identification sstems can be closed-set or openset. Closed-set speaer identification refers to the case where the speaer is nown a priori to be a member of a set of speaers. Open-set speaer identification includes the additional possibilit where the speaer ma not be member of the set of speaers. Thresholding is often used to determine if a speaer belongs or not to the Open-set in speaer identification and/or speaer verification. Another distinguishing feature of speaer recognition sstems is whether the are text-dependent or textindependent. Text-dependent speaer recognition sstems require that the speaer utter specific phrase or a given password. Text-independent speaer recognition sstems identif the speaer regardless of his utterance [1]. This paper focuses on the text-independent speaer identification and speaer verification tass. The organization of this paper is as follows. Section II reviews the and for text-independent speaer recognition. Section III describes the database used in this paper and speech analsis. Section IV reviews the training procedure, Section V describes the recognition procedure, Section VI discusses the conducted experiments and finall, Section VII gives the conclusions. II. RBF AND EBF NEURAL NETWORKS Radial Basis Function Neural Networs and Elliptical Basis Function Neural Networs can be viewed as a feed-forward neural networ with a single hidden laer. An RBF or EBF networ with D inputs, hidden units and K outputs is shown in Fig. 2. The output laer forms a linear combiner which calculates the weighted sum of the outputs of the hidden units [6].
2 1 2 Φ 1 smoothing parameters controlling the spread of the basis function [6]. III. DATABASE AND SPEECH ANALYSIS 3 D Φ 2 Φ f 1 f 2 f K The database for the experiments reported in this paper is a subset of the DARPA TIIT database [7]. This set represents 60 speaers of the different dialect regions. The set includes 44 males and 16 females. These speaers were divided into three equal subsets: speaer set, anti-speaer set and impostor set. The pre-processing of the TIIT speech data consists of several steps. First, the speech data is processed b the 1 application of a pre-emphasis filter H ( z) = z. A 30 ms Hamming window is applied to the speech ever 10 ms. A 12 th order linear predictive (LP) analsis is performed for each speech frame. The features consist of the 12 cepstral coefficients (LPCC) derived from the LP coefficients [4]. There are ten utterances for each speaer in the selected set. Five of utterances (SX) are concatenated and used for training. The remaining five sentences (SA, SI) are used individuall for testing. The mean duration of training data is s per speaer, and the mean duration of each test utterance is 2.79 s. Fig. 2. RBF or EBF Neural Networ structure The th output of an RBF or EBF neural networ has the form: f ( Y ) = w 0 + w j Φ j ( Y ) j= 1 j = 1,..., and = 1,...,K (1) Where w j are the networ weights. For an RBF networ the activation function is: 1 2 Φ j ( Y ) = exp Y c 2 j 2. σ j j = 1,..., (2) Where. denotes the Euclidean distance. For an EBF networ, on the other hand: 1 ' 1 Φ j( Y ) = exp ( Y c j ) Σ j ( Y c j ) 2. γ j P 3 γ j = c c j j = 1,..., (3) 5 = 1 In (2) and (3), Φ j are the activation functions, {, t = 1 T} Y = t,..., is the input vector of length T and dimension D, widths, Input laer Hidden laer c j are the function centres, σ j are the function jare the covariance matrices and Output laer γ j are a IV. TRAINING PROCEDURE Each speaer in the speaer set was assigned a personalized or modeling the characteristic of his or her own voice. Each networ was trained to recognize the data derived from two classes, speaer class and anti-speaer class [6]. Therefore, each networ was composed of 12 inputs varied numbers of hidden nodes (), and two outputs, with each output representing one class. Onl the first of these was used in closed-set speaer identification. For each RBF neural networ, the K-means algorithm [8] was applied to the corresponding speaer and all anti-speaers separatel to obtain the function centres [6]. Next, the P- nearest neighbor algorithm with P set to 2 was applied to the resulting function centres to determine the function widths [6]. For the EBF neural networs, the function centres and covariance matrices were determined b the E algorithm with diagonal covariance matrices and full covariance matrices or K-means algorithm and sample covariance [6]. Finall the weights of the output laer of s and s can be obtained b using the Least ean Squares (LS) algorithm [8], [9]. Target values during training were [+1,0] for speaer frame and [0,+1] for anti-speaer frame. V. RECOGNITION PROCEDURE A. Closed-set speaer identification The identification test was done b comparing the outputs of all 20 speaers RBF and EBF networs for a particular utterance for the 100 speaer identification tests in total. The networ with highest first output was considered to belong to the true speaer [10], [11]. The structure of speaer identification sstem with S speaers is shown in Fig. 3.
3 Speaer 1 RBF/EBF f 1 1 (Y) Fig. 3. Structure of speaer identification sstem with S speaers B. Speaer verification As the ratio of training vectors between the speaer class and anti-speaer class is about 1 to 20 (for each networ, training vectors were derived from the corresponding speaer and 20 anti-speaers), the networ will favor the anti-speaer during verification b alwas giving outputs which are close to one for the anti-speaer class and close to zero for the speaer class. [6] proposed the solution of this problem b scaling the outputs during verification so that the new average outputs are approximatel equal to 0.5 for both classes. In other words, 1 f ( Y) f ( Y) =, = 12,. A wa to estimate the prior 2 P( C ) ( C f 1 2 (Y) Input Y Speaer 2 RBF/EBF Speaer S RBF/EBF f 1 S (Y) Σ Σ Σ Select maximum to identif speaer probabilit P ) is to divide the number of patterns in class C b the total number of patterns in the training set. During verification, a vector Y = { t, t =1,..., T} corresponding to an utterance spoen b an unnown speaer was fed into the networ. Then the scaled average outputs corresponding to the speaer and anti-speaer were computed [6]. exp ( ) 1 T f Y z =, = 1, 2 T t = 1 exp f 1( Y ) + exp f 2( Y ) (4) Where T is the number of patterns in the test sequence Y. Verification decisions were based on the criterion: > ζ : Accept the claimant z = z1 z2 (5) ζ : Reject the claimant Where ζ [ 1, + 1] is a threshold controlling the false rejection rate and false acceptation rate. False rejection is the rate of falsel rejecting a true speaer, while false acceptance rate measures the rate of incorrectl accepting impostors. VI. EXPERIENTAL RESULTS Several speaer recognition experiments were performed to evaluate the s and s classifiers. These experiments include closed-set speaer identification, speaer verification, and open-set speaer identification, speaer verification. The Identification Rate (IR) is used for evaluating the performance of a closed set speaer identification sstems. The Equal Error Rate (EER) is used for evaluating the performance of the open-set speaer identification and speaer verification sstems. The EER is defined as the point at which two errors False Rejection Rate (FRR) and False Acceptance Rate (FAR) are equal. Note that all operating curves (FRR versus FAR - ROC: Receiver Operating Characteristics) presented in this section for speaer verification and open-set speaer identification represent the posterior performance of the classifiers, given the speaer and impostor scores. On the other hand the EER is obtained b adjusting the threshold during verification to equalize the FAR and FRR. Though this adjustment is impractical in the real sstems, the EER indicate the potential of the networs. The threshold used is a posterior global threshold varing between [-1:0.01:1]. Each networ is composed of centres contributed b the corresponding speaer and anti-speaer ( = speaer centres + anti-speaer centres). EBF-Diag and EBF-Full represent the experiments in which the parameters were obtained b the E algorithm with diagonal covariance matrices and full covariance matrices. EBF-Sample denotes the case where K- means algorithm and sample covariance were used to estimate the function centres and covariance matrices of the EBF networs. Table 1 shows the number of speaer and impostor tests performed in each tas. TABLE I NUBER OF SPEAKER AND IPOSTOR TESTS PER TASK Tas # speaer tests # impostor tests Closed-set speaer identification Closed-set speaer verification Open-set speaer identification Open-set speaer verification
4 A. Closed-set speaer identification TABLE 2 IR FOR CLOSED-SET SPEAKER IDENTIFICATION 8 (4+4) 83 % 98 % 91 % 97 % 16 (8+8) 96 % 99 % 98 % 100 % 32 (16+16) 100 % 100 % 100 % 100 % Fig. 5. FRR versus FAR for closed-set speaer verification = 32 (16+16) C. Open-set speaer identification TABLE 4 EER FOR OPEN-SET SPEAKER IDENTIFICATION 8 (4+4) % % 24.5 % % Fig. 4. IR versus number of centres for closed-set speaer identification Table 2 and figure 4 show the experimental results of closed-set speaer identification for different networ tpes (RBF or EBF), learning algorithm (EBF-Sample, EBF-Diag, EBF-Full) and networs sizes ( = 8, 16, 32). These results reveal that EBF networ trained b the E algorithm with full covariance matrices (EBF-Full) outperformed the other networs (IR = 100 % with 16 centres). We also note that the identification rate (IR) increases when the number of centres increases for all networs. 16 (8+8) % % % 7.00 % 32 (16+16) % 7.00 % % 5.75 % B. Closed-set speaer verification TABLE 3 EER FOR CLOSED-SET SPEAKER VERIFICATION 8 (4+4) 7.70 % 1.35 % 3.11 % 1.39 % 16 (8+8) 2.50 % 0.68 % 1.91 % 0.81 % 32 (16+16) 1.95 % 0.33 % 1.12 % 0.21 % Fig. 6. FRR versus FAR for open-set speaer identification = 32 (16+16)
5 D. Open-set speaer verification TABLE 5 EER FOR OPEN-SET SPEAKER VERIFICATION 8 (4+4) 7.00 % 1.79 % 3.70 % 1.82 % 16 (8+8) 2.52 % 1.16 % 2.91 % 1.00 % 32 (16+16) 2.06 % 0.58 % 1.43 % 0.52 % Tables (3, 4, 5) and figures (5, 6, 7) summarize the equal error rate (EER) for different networ tpes (RBF or EBF), learning algorithm (EBF-Sample, EBF-Diag, EBF-Full) and networs sizes ( = 8, 16, 32). We can see from this results that for all size of networ, the EBF networs trained b the E algorithm with full covariance matrices (EBF-Full) attain a lower EER (5.75 % in open-set speaer identification, 0.21 % in closed-set speaer verification, and 0.52 % in open-set speaer verification, with 32 centres) as compared to the EBF networs trained b the E algorithm with diagonal covariance matrices, the EBF networs with sample covariance (EBF-Sample) and RBF networs. A comparison of the error rates corresponding to EBF-Diag networs and EBF-Full networs reveals that the diagonal covariance matrices less capable of modeling speaer characteristics than the full covariance matrices. These results demonstrate the capabilit of the E algorithm and the advantage of using full covariance matrices in the basis functions. We can see that for all size of networ, the EBF networs trained with the E algorithm (EBF-Full) attain a lower equal error rate as compared to the EBF networs trained in sample covariance (EBF-Sample). We also note that the equal error rate (EER) decreases when the number of centres increases for all networs. We also compare the time spent for the different networs in training (learning of 20 networs) and recognition phase b using 32 centres per networ (see Table 6). The time of preprocessing is not taen into account in the Table 6. Knowing that the pre-processing of a signal of 3,48 s taes 1 s. TABLE 6 TRAINING AND RECOGNITION TIE WITH = 32 (16+16) PER NETWORK EBF- Sample EBF- Diag EBF- Full Training time (min) Identification time (s) Verification time (s) Fig. 7. FRR versus FAR for open-set speaer verification = 32 (16+16) The training time of the networs varies between 8.06 min and min, the smallest training duration is that of RBF networs, and the greatest training period is that of EBF networs trained b E algorithm with full covariance matrices (EBF-Full). The identification time varies in the interval 0.33 s for RBF networs, and 1.15 s for EBF networs with sample covariance matrices (EBF-Sample) and EBF networs trained b E algorithm with full covariance matrices (EBF-Full). With regard to the verification time, RBF networs and EBF networs trained b E algorithm with diagonal covariance matrices (EBF-Diag) tae 0.06 s, While EBF-Full networs and EBF-Sample networs tae 0.11 s. Knowing that we use the commands tic and toc of atlab to calculate the execution times, with a microprocessor which turns to 2 GHz. VII. CONCLUSION This paper has evaluated the use of s and s for text-independent speaer recognition. The performance b the with full covariance matrices is better than the s, s with diagonal matrices and with sample covariance matrices. The results confirmed the claim b [6] that the use of E algorithm to estimate the parameters of elliptical basis function networs achieves the maximum performances, and illustrates that the full covariance matrices of the EBF networs are capable of providing a better representation of the feature vectors. ACKNOWLEDGENT This wor is conducted in the Electronics Research Laborator of Sida (LRES) and is supported in part b the Algerian inistr of Higher Education and Scientific Research under the CNEPRU project code : J REFERENCES [1] D. O Shaughness, Speaer Recognition, ASSP agazine, IEEE Signal Processing agazine, Vol. 3, No. 4, Part. 1, pp. 4-17, October [2] J. P. Campbell, JR, Speaer Recognition : A Tutorial, Proceedings of the IEEE, Vol. 85, No. 9, September 1997.
6 [3] N. BALASKA, Reconnaissance du locuteur par les méthodes statistiques et connexionnistes: étude comparative, émoire de agistère, Université 20 août 55 of Sida, fev [4] S. Young, G. Evermann, T. Kershaw, G. oore, J. Odell, D. Ollason, D. Pove, V. Valtchev, P. Woodland, The HTK Boo (for HTK Version 3.2.1), Copright Cambridge Universit Engineering Department. [5] D.A. Renolds, Experimental Evaluation of Features for Robust Speaer Identification, IEEE Transactions on Speech and Audio Processing, Vol. 2, No. 4, pp , October [6]. W. a, S. Y. Kung, Estimation of Elliptical Basis Function Parameters b the E Algorithm with Application to Speaer Verification, IEEE Transaction on Neural Networs, Vol. 11, N0. 4, Jul [7] The DARPA TIIT Acoustic- Phonetic Continuous Speech Corpus. [8] D. R. Hush, B. G. Horne, Progress in Supervised Neural Networs, IEEE Signal Processing agazine, Vol. 10, No 1, pp. 8-39, Januar [9] B.. Wilamowsi, Neural Networ Architectures and Learning, International Conference on Industrial Technolog, Vol. 1, pp. TU1- TU12, IEEE, December [10] S. E. Fredricson, L. Tarasseno, Text-Independent Speaer Recognition Using Neural Networ Techniques, Fourth International Conference on Artificial Neural Networs, Conference Publication No. 409, pp , June [11]. W. a, W. G. Allen, G. G. Sexton, Speaer Identification Using Radial Basis Functions, Third International Conference on Artificial Neural Networs, pp , a 1993.
Estimation of Relative Operating Characteristics of Text Independent Speaker Verification
International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,
More informationARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92
ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000
More informationRobust Speaker Identification
Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }
More informationExperiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition
Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden
More informationIDIAP. Martigny - Valais - Suisse ADJUSTMENT FOR THE COMPENSATION OF MODEL MISMATCH IN SPEAKER VERIFICATION. Frederic BIMBOT + Dominique GENOUD *
R E S E A R C H R E P O R T IDIAP IDIAP Martigny - Valais - Suisse LIKELIHOOD RATIO ADJUSTMENT FOR THE COMPENSATION OF MODEL MISMATCH IN SPEAKER VERIFICATION Frederic BIMBOT + Dominique GENOUD * IDIAP{RR
More informationA Generative Model Based Kernel for SVM Classification in Multimedia Applications
Appears in Neural Information Processing Systems, Vancouver, Canada, 2003. A Generative Model Based Kernel for SVM Classification in Multimedia Applications Pedro J. Moreno Purdy P. Ho Hewlett-Packard
More informationSupport Vector Machines using GMM Supervectors for Speaker Verification
1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:
More informationHow to Deal with Multiple-Targets in Speaker Identification Systems?
How to Deal with Multiple-Targets in Speaker Identification Systems? Yaniv Zigel and Moshe Wasserblat ICE Systems Ltd., Audio Analysis Group, P.O.B. 690 Ra anana 4307, Israel yanivz@nice.com Abstract In
More informationFront-End Factor Analysis For Speaker Verification
IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING Front-End Factor Analysis For Speaker Verification Najim Dehak, Patrick Kenny, Réda Dehak, Pierre Dumouchel, and Pierre Ouellet, Abstract This
More informationGaussian Mixture Distance for Information Retrieval
Gaussian Mixture Distance for Information Retrieval X.Q. Li and I. King fxqli, ingg@cse.cuh.edu.h Department of omputer Science & Engineering The hinese University of Hong Kong Shatin, New Territories,
More informationSinger Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers
Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Kumari Rambha Ranjan, Kartik Mahto, Dipti Kumari,S.S.Solanki Dept. of Electronics and Communication Birla
More informationFACTORIAL HMMS FOR ACOUSTIC MODELING. Beth Logan and Pedro Moreno
ACTORIAL HMMS OR ACOUSTIC MODELING Beth Logan and Pedro Moreno Cambridge Research Laboratories Digital Equipment Corporation One Kendall Square, Building 700, 2nd loor Cambridge, Massachusetts 02139 United
More informationNoise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm
EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic
More informationEigenvoice Speaker Adaptation via Composite Kernel PCA
Eigenvoice Speaker Adaptation via Composite Kernel PCA James T. Kwok, Brian Mak and Simon Ho Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Hong Kong [jamesk,mak,csho]@cs.ust.hk
More informationDynamic Time-Alignment Kernel in Support Vector Machine
Dynamic Time-Alignment Kernel in Support Vector Machine Hiroshi Shimodaira School of Information Science, Japan Advanced Institute of Science and Technology sim@jaist.ac.jp Mitsuru Nakai School of Information
More informationComparison of Log-Linear Models and Weighted Dissimilarity Measures
Comparison of Log-Linear Models and Weighted Dissimilarity Measures Daniel Keysers 1, Roberto Paredes 2, Enrique Vidal 2, and Hermann Ney 1 1 Lehrstuhl für Informatik VI, Computer Science Department RWTH
More informationHand Written Digit Recognition using Kalman Filter
International Journal of Electronics and Communication Engineering. ISSN 0974-2166 Volume 5, Number 4 (2012), pp. 425-434 International Research Publication House http://www.irphouse.com Hand Written Digit
More informationMixtures of Gaussians with Sparse Structure
Mixtures of Gaussians with Sparse Structure Costas Boulis 1 Abstract When fitting a mixture of Gaussians to training data there are usually two choices for the type of Gaussians used. Either diagonal or
More informationRobust Speaker Identification System Based on Wavelet Transform and Gaussian Mixture Model
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 19, 267-282 (2003) Robust Speaer Identification System Based on Wavelet Transform and Gaussian Mixture Model Department of Electrical Engineering Tamang University
More informationSpeaker Verification Using Accumulative Vectors with Support Vector Machines
Speaker Verification Using Accumulative Vectors with Support Vector Machines Manuel Aguado Martínez, Gabriel Hernández-Sierra, and José Ramón Calvo de Lara Advanced Technologies Application Center, Havana,
More informationApplication of cepstrum and neural network to bearing fault detection
Journal of Mechanical Science and Technology 23 (2009) 2730~2737 Journal of Mechanical Science and Technology www.springerlin.com/content/738-494x DOI 0.007/s2206-009-0802-9 Application of cepstrum and
More informationPresented By: Omer Shmueli and Sivan Niv
Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan
More informationEstimation of the wrist torque of robot gripper using data fusion and ANN techniques
Estimation of the wrist torque of robot gripper using data fusion and ANN techniques Wu Ting 1 Tang Xue-hua 1 Li Zhu 1 School of Mechanical Engineering Shanghai Dianji Universit Shanghai 0040 China School
More informationTinySR. Peter Schmidt-Nielsen. August 27, 2014
TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),
More informationHYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH
HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi
More informationComparing linear and non-linear transformation of speech
Comparing linear and non-linear transformation of speech Larbi Mesbahi, Vincent Barreaud and Olivier Boeffard IRISA / ENSSAT - University of Rennes 1 6, rue de Kerampont, Lannion, France {lmesbahi, vincent.barreaud,
More informationUsually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However,
Odyssey 2012 The Speaker and Language Recognition Workshop 25-28 June 2012, Singapore First attempt of Boltzmann Machines for Speaker Verification Mohammed Senoussaoui 1,2, Najim Dehak 3, Patrick Kenny
More informationGMM-Based Speech Transformation Systems under Data Reduction
GMM-Based Speech Transformation Systems under Data Reduction Larbi Mesbahi, Vincent Barreaud, Olivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-22305 Lannion Cedex
More informationIntelligent Handwritten Digit Recognition using Artificial Neural Network
RESEARCH ARTICLE OPEN ACCESS Intelligent Handwritten Digit Recognition using Artificial Neural Networ Saeed AL-Mansoori Applications Development and Analysis Center (ADAC), Mohammed Bin Rashid Space Center
More informationReconnaissance d objetsd et vision artificielle
Reconnaissance d objetsd et vision artificielle http://www.di.ens.fr/willow/teaching/recvis09 Lecture 6 Face recognition Face detection Neural nets Attention! Troisième exercice de programmation du le
More informationGraphic Representation Method and Neural Network Recognition of Time-Frequency Vectors of Speech Information
Programming and Computer Software, Vol. 29, No. 4, 2003, pp. 210 218. Translated from Programmirovanie, Vol. 29, No. 4, 2003. Original Russian Text Copright 2003 b Zhirkov, Kortchagine, Lukin, Krlov, Baakovskii.
More informationMaximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems
Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems Chin-Hung Sit 1, Man-Wai Mak 1, and Sun-Yuan Kung 2 1 Center for Multimedia Signal Processing Dept. of
More informationFuzzy Support Vector Machines for Automatic Infant Cry Recognition
Fuzzy Support Vector Machines for Automatic Infant Cry Recognition Sandra E. Barajas-Montiel and Carlos A. Reyes-García Instituto Nacional de Astrofisica Optica y Electronica, Luis Enrique Erro #1, Tonantzintla,
More informationBiometrics: Introduction and Examples. Raymond Veldhuis
Biometrics: Introduction and Examples Raymond Veldhuis 1 Overview Biometric recognition Face recognition Challenges Transparent face recognition Large-scale identification Watch list Anonymous biometrics
More informationSound Recognition in Mixtures
Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems
More informationExemplar-based voice conversion using non-negative spectrogram deconvolution
Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore
More informationCOMPARISON STUDY OF SENSITIVITY DEFINITIONS OF NEURAL NETWORKS
COMPARISON SUDY OF SENSIIVIY DEFINIIONS OF NEURAL NEORKS CHUN-GUO LI 1, HAI-FENG LI 1 Machine Learning Center, Facult of Mathematics and Computer Science, Hebei Universit, Baoding 07100, China Department
More informationBIOMETRIC verification systems are used to verify the
86 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 1, JANUARY 2004 Likelihood-Ratio-Based Biometric Verification Asker M. Bazen and Raymond N. J. Veldhuis Abstract This paper
More information6 Quantization of Discrete Time Signals
Ramachandran, R.P. Quantization of Discrete Time Signals Digital Signal Processing Handboo Ed. Vijay K. Madisetti and Douglas B. Williams Boca Raton: CRC Press LLC, 1999 c 1999byCRCPressLLC 6 Quantization
More informationPHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS
PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationMonaural speech separation using source-adapted models
Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications
More informationISCA Archive
ISCA Archive http://www.isca-speech.org/archive ODYSSEY04 - The Speaker and Language Recognition Workshop Toledo, Spain May 3 - June 3, 2004 Analysis of Multitarget Detection for Speaker and Language Recognition*
More informationJoint Factor Analysis for Speaker Verification
Joint Factor Analysis for Speaker Verification Mengke HU ASPITRG Group, ECE Department Drexel University mengke.hu@gmail.com October 12, 2012 1/37 Outline 1 Speaker Verification Baseline System Session
More informationClassification and Clustering of Printed Mathematical Symbols with Improved Backpropagation and Self-Organizing Map
BULLETIN Bull. Malaysian Math. Soc. (Second Series) (1999) 157-167 of the MALAYSIAN MATHEMATICAL SOCIETY Classification and Clustering of Printed Mathematical Symbols with Improved Bacpropagation and Self-Organizing
More informationCorrespondence. Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure
Correspondence Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure It is possible to detect and classify moving and stationary targets using ground surveillance pulse-doppler radars
More informationWavelet Transform in Speech Segmentation
Wavelet Transform in Speech Segmentation M. Ziółko, 1 J. Gałka 1 and T. Drwięga 2 1 Department of Electronics, AGH University of Science and Technology, Kraków, Poland, ziolko@agh.edu.pl, jgalka@agh.edu.pl
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationAllpass Modeling of LP Residual for Speaker Recognition
Allpass Modeling of LP Residual for Speaker Recognition K. Sri Rama Murty, Vivek Boominathan and Karthika Vijayan Department of Electrical Engineering, Indian Institute of Technology Hyderabad, India email:
More informationLecture 5: GMM Acoustic Modeling and Feature Extraction
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic
More informationECE662: Pattern Recognition and Decision Making Processes: HW TWO
ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are
More informationSCORE CALIBRATING FOR SPEAKER RECOGNITION BASED ON SUPPORT VECTOR MACHINES AND GAUSSIAN MIXTURE MODELS
SCORE CALIBRATING FOR SPEAKER RECOGNITION BASED ON SUPPORT VECTOR MACHINES AND GAUSSIAN MIXTURE MODELS Marcel Katz, Martin Schafföner, Sven E. Krüger, Andreas Wendemuth IESK-Cognitive Systems University
More informationMixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes
Mixtures of Gaussians with Sparse Regression Matrices Constantinos Boulis, Jeffrey Bilmes {boulis,bilmes}@ee.washington.edu Dept of EE, University of Washington Seattle WA, 98195-2500 UW Electrical Engineering
More informationTone Analysis in Harmonic-Frequency Domain and Feature Reduction using KLT+LVQ for Thai Isolated Word Recognition
Tone Analysis in Harmonic-Frequency Domain and Feature Reduction using KLT+LVQ for Thai Isolated Word Recognition SARITCHAI PREDAWAN 1 PRASIT JIYAPANICHKUL 2 and CHOM KIMPAN 3 Faculty of Information Technology
More informationSymmetric Distortion Measure for Speaker Recognition
ISCA Archive http://www.isca-speech.org/archive SPECOM 2004: 9 th Conference Speech and Computer St. Petersburg, Russia September 20-22, 2004 Symmetric Distortion Measure for Speaker Recognition Evgeny
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationText Independent Speaker Identification Using Imfcc Integrated With Ica
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 7, Issue 5 (Sep. - Oct. 2013), PP 22-27 ext Independent Speaker Identification Using Imfcc
More informationKernel Based Text-Independnent Speaker Verification
12 Kernel Based Text-Independnent Speaker Verification Johnny Mariéthoz 1, Yves Grandvalet 1 and Samy Bengio 2 1 IDIAP Research Institute, Martigny, Switzerland 2 Google Inc., Mountain View, CA, USA The
More informationINTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY
[Gaurav, 2(1): Jan., 2013] ISSN: 2277-9655 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY Face Identification & Detection Using Eigenfaces Sachin.S.Gurav *1, K.R.Desai 2 *1
More informationIBM Research Report. Training Universal Background Models for Speaker Recognition
RC24953 (W1003-002) March 1, 2010 Other IBM Research Report Training Universal Bacground Models for Speaer Recognition Mohamed Kamal Omar, Jason Pelecanos IBM Research Division Thomas J. Watson Research
More informationApplication of Fully Recurrent (FRNN) and Radial Basis Function (RBFNN) Neural Networks for Simulating Solar Radiation
Bulletin of Environment, Pharmacology and Life Sciences Bull. Env. Pharmacol. Life Sci., Vol 3 () January 04: 3-39 04 Academy for Environment and Life Sciences, India Online ISSN 77-808 Journal s URL:http://www.bepls.com
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single
More informationQUADRATIC AND CONVEX MINIMAX CLASSIFICATION PROBLEMS
Journal of the Operations Research Societ of Japan 008, Vol. 51, No., 191-01 QUADRATIC AND CONVEX MINIMAX CLASSIFICATION PROBLEMS Tomonari Kitahara Shinji Mizuno Kazuhide Nakata Toko Institute of Technolog
More informationFace Recognition Using Eigenfaces
Face Recognition Using Eigenfaces Prof. V.P. Kshirsagar, M.R.Baviskar, M.E.Gaikwad, Dept. of CSE, Govt. Engineering College, Aurangabad (MS), India. vkshirsagar@gmail.com, madhumita_baviskar@yahoo.co.in,
More informationHidden Markov Model and Speech Recognition
1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep
More informationRole of Assembling Invariant Moments and SVM in Fingerprint Recognition
56 Role of Assembling Invariant Moments SVM in Fingerprint Recognition 1 Supriya Wable, 2 Chaitali Laulkar 1, 2 Department of Computer Engineering, University of Pune Sinhgad College of Engineering, Pune-411
More informationModeling the creaky excitation for parametric speech synthesis.
Modeling the creaky excitation for parametric speech synthesis. 1 Thomas Drugman, 2 John Kane, 2 Christer Gobl September 11th, 2012 Interspeech Portland, Oregon, USA 1 University of Mons, Belgium 2 Trinity
More informationFantope Regularization in Metric Learning
Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction
More informationReformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features
Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.
More informationAn artificial neural networks (ANNs) model is a functional abstraction of the
CHAPER 3 3. Introduction An artificial neural networs (ANNs) model is a functional abstraction of the biological neural structures of the central nervous system. hey are composed of many simple and highly
More informationSignal Modeling Techniques in Speech Recognition. Hassan A. Kingravi
Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction
More informationClassification of handwritten digits using supervised locally linear embedding algorithm and support vector machine
Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Olga Kouropteva, Oleg Okun, Matti Pietikäinen Machine Vision Group, Infotech Oulu and
More informationUpper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition
Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition Jorge Silva and Shrikanth Narayanan Speech Analysis and Interpretation
More informationMulti-Layer Boosting for Pattern Recognition
Multi-Layer Boosting for Pattern Recognition François Fleuret IDIAP Research Institute, Centre du Parc, P.O. Box 592 1920 Martigny, Switzerland fleuret@idiap.ch Abstract We extend the standard boosting
More informationClassification and Pattern Recognition
Classification and Pattern Recognition Léon Bottou NEC Labs America COS 424 2/23/2010 The machine learning mix and match Goals Representation Capacity Control Operational Considerations Computational Considerations
More informationSpeaker recognition by means of Deep Belief Networks
Speaker recognition by means of Deep Belief Networks Vasileios Vasilakakis, Sandro Cumani, Pietro Laface, Politecnico di Torino, Italy {first.lastname}@polito.it 1. Abstract Most state of the art speaker
More informationChapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析
Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification
More informationApplication of hopfield network in improvement of fingerprint recognition process Mahmoud Alborzi 1, Abbas Toloie- Eshlaghy 1 and Dena Bazazian 2
5797 Available online at www.elixirjournal.org Computer Science and Engineering Elixir Comp. Sci. & Engg. 41 (211) 5797-582 Application hopfield network in improvement recognition process Mahmoud Alborzi
More informationShankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms
Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models. Say Wei Foo, Yong Lian, Liang Dong. IEEE Transactions on Circuits and Systems for Video Technology, May 2004. Shankar
More informationA Small Footprint i-vector Extractor
A Small Footprint i-vector Extractor Patrick Kenny Odyssey Speaker and Language Recognition Workshop June 25, 2012 1 / 25 Patrick Kenny A Small Footprint i-vector Extractor Outline Introduction Review
More informationComparing Robustness of Pairwise and Multiclass Neural-Network Systems for Face Recognition
Comparing Robustness of Pairwise and Multiclass Neural-Network Systems for Face Recognition J. Uglov, V. Schetinin, C. Maple Computing and Information System Department, University of Bedfordshire, Luton,
More informationarxiv: v1 [cs.sd] 25 Oct 2014
Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech arxiv:1410.6903v1 [cs.sd] 25 Oct 2014 Laxmi Narayana M, Sunil Kumar Kopparapu TCS Innovation Lab - Mumbai, Tata Consultancy Services, Yantra
More informationGeneralized Cyclic Transformations in Speaker-Independent Speech Recognition
Generalized Cyclic Transformations in Speaker-Independent Speech Recognition Florian Müller 1, Eugene Belilovsky, and Alfred Mertins Institute for Signal Processing, University of Lübeck Ratzeburger Allee
More informationEUSIPCO
EUSIPCO 3 569736677 FULLY ISTRIBUTE SIGNAL ETECTION: APPLICATION TO COGNITIVE RAIO Franc Iutzeler Philippe Ciblat Telecom ParisTech, 46 rue Barrault 753 Paris, France email: firstnamelastname@telecom-paristechfr
More informationHierarchical Multi-Stream Posterior Based Speech Recognition System
Hierarchical Multi-Stream Posterior Based Speech Recognition System Hamed Ketabdar 1,2, Hervé Bourlard 1,2 and Samy Bengio 1 1 IDIAP Research Institute, Martigny, Switzerland 2 Ecole Polytechnique Fédérale
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short
More informationSPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS
SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS by Jinjin Ye, B.S. A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS for
More informationImproving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer
Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer Gábor Gosztolya, András Kocsor Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationScore calibration for optimal biometric identification
Score calibration for optimal biometric identification (see also NIST IBPC 2010 online proceedings: http://biometrics.nist.gov/ibpc2010) AI/GI/CRV 2010, Ottawa Dmitry O. Gorodnichy Head of Video Surveillance
More information1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.
Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification
More informationModifying Voice Activity Detection in Low SNR by correction factors
Modifying Voice Activity Detection in Low SNR by correction factors H. Farsi, M. A. Mozaffarian, H.Rahmani Department of Electrical Engineering University of Birjand P.O. Box: +98-9775-376 IRAN hfarsi@birjand.ac.ir
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationECE 661: Homework 10 Fall 2014
ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;
More informationGMM Vector Quantization on the Modeling of DHMM for Arabic Isolated Word Recognition System
GMM Vector Quantization on the Modeling of DHMM for Arabic Isolated Word Recognition System Snani Cherifa 1, Ramdani Messaoud 1, Zermi Narima 1, Bourouba Houcine 2 1 Laboratoire d Automatique et Signaux
More informationApplication of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data
Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data Juan Torres 1, Ashraf Saad 2, Elliot Moore 1 1 School of Electrical and Computer
More informationMultimodal Biometric Fusion Joint Typist (Keystroke) and Speaker Verification
Multimodal Biometric Fusion Joint Typist (Keystroke) and Speaker Verification Jugurta R. Montalvão Filho and Eduardo O. Freire Abstract Identity verification through fusion of features from keystroke dynamics
More informationISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM
ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM Mayukh Bhaowal and Kunal Chawla (Students)Indian Institute of Information Technology, Allahabad, India Abstract: Key words: Speech recognition
More informationEngineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics
Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Phil Woodland: pcw@eng.cam.ac.uk Lent 2013 Engineering Part IIB: Module 4F11 What is Speech Recognition?
More information