Speaker Recognition Using Artificial Neural Networks: RBFNNs vs. EBFNNs

Size: px
Start display at page:

Download "Speaker Recognition Using Artificial Neural Networks: RBFNNs vs. EBFNNs"

Transcription

1 Speaer Recognition Using Artificial Neural Networs: s vs. s BALASKA Nawel ember of the Sstems & Control Research Group within the LRES Lab., Universit 20 Août 55 of Sida, BP: 26, Sida, 21000, Algeria nabalasa@ahoo.fr AHIDA Zahir Director of the Sstems & Control Research Group within the LRES Lab., Universit 20 Août 55 of Sida, BP: 26, Sida, 21000, Algeria zahirahmida@ahoo.fr GOUTAS Ahcène Director of the Signal & Image Processing Research Group within the LRES Lab., Universit 20 Août 55 of Sida, BP: 26, Sida, 21000, Algeria a.goutas@ahoo.fr Abstract- This paper deals with the application of Radial Basis Function Neural Networs (s) and Elliptical Basis Function Neural Networs (s) for textindependent speaer recognition experiments. These include both closed-set and open-set speaer identification and speaer verification. The database used is a subset of the TIIT database consisting of 60 speaers from different dialect regions. LP-derived Cepstral Coefficients (LPCC) are used as the speaer specific features. Simulation results show that outperform for several speaer recognition experiments. Kewords- Speaer recognition, speaer identification, speaer verification, Radial Basis Function Neural Networs (s), Elliptical Basis Function Neural Networs (s). I. INTRODUCTION In general speaer recognition sstems fall into two main categories, namel: speaer identification sstems and speaer verification sstems. In speaer identification, the goal is to identif an unnown voice from a set of nown voices. Whereas, the objective of speaer verification is to verif whether an unnown voice matches the voice of a speaer whose identit is being claimed [1], [2] and [3]. Speaer identification sstems are mainl used in criminal investigation while speaer verification sstems are used in securit access control. A generic speaer recognition sstem is shown in Fig. 1 Speech Signal Feature Extraction Classification Fig. 1. Speaer recognition sstem Speaer Ident. or Speaer Verif. In Fig. 1, the desired features are first extracted from the speech signal [4], [5]. The extracted features are then used as inputs to a classifier, which maes the final decision regarding identification or verification. Speaer identification sstems can be closed-set or openset. Closed-set speaer identification refers to the case where the speaer is nown a priori to be a member of a set of speaers. Open-set speaer identification includes the additional possibilit where the speaer ma not be member of the set of speaers. Thresholding is often used to determine if a speaer belongs or not to the Open-set in speaer identification and/or speaer verification. Another distinguishing feature of speaer recognition sstems is whether the are text-dependent or textindependent. Text-dependent speaer recognition sstems require that the speaer utter specific phrase or a given password. Text-independent speaer recognition sstems identif the speaer regardless of his utterance [1]. This paper focuses on the text-independent speaer identification and speaer verification tass. The organization of this paper is as follows. Section II reviews the and for text-independent speaer recognition. Section III describes the database used in this paper and speech analsis. Section IV reviews the training procedure, Section V describes the recognition procedure, Section VI discusses the conducted experiments and finall, Section VII gives the conclusions. II. RBF AND EBF NEURAL NETWORKS Radial Basis Function Neural Networs and Elliptical Basis Function Neural Networs can be viewed as a feed-forward neural networ with a single hidden laer. An RBF or EBF networ with D inputs, hidden units and K outputs is shown in Fig. 2. The output laer forms a linear combiner which calculates the weighted sum of the outputs of the hidden units [6].

2 1 2 Φ 1 smoothing parameters controlling the spread of the basis function [6]. III. DATABASE AND SPEECH ANALYSIS 3 D Φ 2 Φ f 1 f 2 f K The database for the experiments reported in this paper is a subset of the DARPA TIIT database [7]. This set represents 60 speaers of the different dialect regions. The set includes 44 males and 16 females. These speaers were divided into three equal subsets: speaer set, anti-speaer set and impostor set. The pre-processing of the TIIT speech data consists of several steps. First, the speech data is processed b the 1 application of a pre-emphasis filter H ( z) = z. A 30 ms Hamming window is applied to the speech ever 10 ms. A 12 th order linear predictive (LP) analsis is performed for each speech frame. The features consist of the 12 cepstral coefficients (LPCC) derived from the LP coefficients [4]. There are ten utterances for each speaer in the selected set. Five of utterances (SX) are concatenated and used for training. The remaining five sentences (SA, SI) are used individuall for testing. The mean duration of training data is s per speaer, and the mean duration of each test utterance is 2.79 s. Fig. 2. RBF or EBF Neural Networ structure The th output of an RBF or EBF neural networ has the form: f ( Y ) = w 0 + w j Φ j ( Y ) j= 1 j = 1,..., and = 1,...,K (1) Where w j are the networ weights. For an RBF networ the activation function is: 1 2 Φ j ( Y ) = exp Y c 2 j 2. σ j j = 1,..., (2) Where. denotes the Euclidean distance. For an EBF networ, on the other hand: 1 ' 1 Φ j( Y ) = exp ( Y c j ) Σ j ( Y c j ) 2. γ j P 3 γ j = c c j j = 1,..., (3) 5 = 1 In (2) and (3), Φ j are the activation functions, {, t = 1 T} Y = t,..., is the input vector of length T and dimension D, widths, Input laer Hidden laer c j are the function centres, σ j are the function jare the covariance matrices and Output laer γ j are a IV. TRAINING PROCEDURE Each speaer in the speaer set was assigned a personalized or modeling the characteristic of his or her own voice. Each networ was trained to recognize the data derived from two classes, speaer class and anti-speaer class [6]. Therefore, each networ was composed of 12 inputs varied numbers of hidden nodes (), and two outputs, with each output representing one class. Onl the first of these was used in closed-set speaer identification. For each RBF neural networ, the K-means algorithm [8] was applied to the corresponding speaer and all anti-speaers separatel to obtain the function centres [6]. Next, the P- nearest neighbor algorithm with P set to 2 was applied to the resulting function centres to determine the function widths [6]. For the EBF neural networs, the function centres and covariance matrices were determined b the E algorithm with diagonal covariance matrices and full covariance matrices or K-means algorithm and sample covariance [6]. Finall the weights of the output laer of s and s can be obtained b using the Least ean Squares (LS) algorithm [8], [9]. Target values during training were [+1,0] for speaer frame and [0,+1] for anti-speaer frame. V. RECOGNITION PROCEDURE A. Closed-set speaer identification The identification test was done b comparing the outputs of all 20 speaers RBF and EBF networs for a particular utterance for the 100 speaer identification tests in total. The networ with highest first output was considered to belong to the true speaer [10], [11]. The structure of speaer identification sstem with S speaers is shown in Fig. 3.

3 Speaer 1 RBF/EBF f 1 1 (Y) Fig. 3. Structure of speaer identification sstem with S speaers B. Speaer verification As the ratio of training vectors between the speaer class and anti-speaer class is about 1 to 20 (for each networ, training vectors were derived from the corresponding speaer and 20 anti-speaers), the networ will favor the anti-speaer during verification b alwas giving outputs which are close to one for the anti-speaer class and close to zero for the speaer class. [6] proposed the solution of this problem b scaling the outputs during verification so that the new average outputs are approximatel equal to 0.5 for both classes. In other words, 1 f ( Y) f ( Y) =, = 12,. A wa to estimate the prior 2 P( C ) ( C f 1 2 (Y) Input Y Speaer 2 RBF/EBF Speaer S RBF/EBF f 1 S (Y) Σ Σ Σ Select maximum to identif speaer probabilit P ) is to divide the number of patterns in class C b the total number of patterns in the training set. During verification, a vector Y = { t, t =1,..., T} corresponding to an utterance spoen b an unnown speaer was fed into the networ. Then the scaled average outputs corresponding to the speaer and anti-speaer were computed [6]. exp ( ) 1 T f Y z =, = 1, 2 T t = 1 exp f 1( Y ) + exp f 2( Y ) (4) Where T is the number of patterns in the test sequence Y. Verification decisions were based on the criterion: > ζ : Accept the claimant z = z1 z2 (5) ζ : Reject the claimant Where ζ [ 1, + 1] is a threshold controlling the false rejection rate and false acceptation rate. False rejection is the rate of falsel rejecting a true speaer, while false acceptance rate measures the rate of incorrectl accepting impostors. VI. EXPERIENTAL RESULTS Several speaer recognition experiments were performed to evaluate the s and s classifiers. These experiments include closed-set speaer identification, speaer verification, and open-set speaer identification, speaer verification. The Identification Rate (IR) is used for evaluating the performance of a closed set speaer identification sstems. The Equal Error Rate (EER) is used for evaluating the performance of the open-set speaer identification and speaer verification sstems. The EER is defined as the point at which two errors False Rejection Rate (FRR) and False Acceptance Rate (FAR) are equal. Note that all operating curves (FRR versus FAR - ROC: Receiver Operating Characteristics) presented in this section for speaer verification and open-set speaer identification represent the posterior performance of the classifiers, given the speaer and impostor scores. On the other hand the EER is obtained b adjusting the threshold during verification to equalize the FAR and FRR. Though this adjustment is impractical in the real sstems, the EER indicate the potential of the networs. The threshold used is a posterior global threshold varing between [-1:0.01:1]. Each networ is composed of centres contributed b the corresponding speaer and anti-speaer ( = speaer centres + anti-speaer centres). EBF-Diag and EBF-Full represent the experiments in which the parameters were obtained b the E algorithm with diagonal covariance matrices and full covariance matrices. EBF-Sample denotes the case where K- means algorithm and sample covariance were used to estimate the function centres and covariance matrices of the EBF networs. Table 1 shows the number of speaer and impostor tests performed in each tas. TABLE I NUBER OF SPEAKER AND IPOSTOR TESTS PER TASK Tas # speaer tests # impostor tests Closed-set speaer identification Closed-set speaer verification Open-set speaer identification Open-set speaer verification

4 A. Closed-set speaer identification TABLE 2 IR FOR CLOSED-SET SPEAKER IDENTIFICATION 8 (4+4) 83 % 98 % 91 % 97 % 16 (8+8) 96 % 99 % 98 % 100 % 32 (16+16) 100 % 100 % 100 % 100 % Fig. 5. FRR versus FAR for closed-set speaer verification = 32 (16+16) C. Open-set speaer identification TABLE 4 EER FOR OPEN-SET SPEAKER IDENTIFICATION 8 (4+4) % % 24.5 % % Fig. 4. IR versus number of centres for closed-set speaer identification Table 2 and figure 4 show the experimental results of closed-set speaer identification for different networ tpes (RBF or EBF), learning algorithm (EBF-Sample, EBF-Diag, EBF-Full) and networs sizes ( = 8, 16, 32). These results reveal that EBF networ trained b the E algorithm with full covariance matrices (EBF-Full) outperformed the other networs (IR = 100 % with 16 centres). We also note that the identification rate (IR) increases when the number of centres increases for all networs. 16 (8+8) % % % 7.00 % 32 (16+16) % 7.00 % % 5.75 % B. Closed-set speaer verification TABLE 3 EER FOR CLOSED-SET SPEAKER VERIFICATION 8 (4+4) 7.70 % 1.35 % 3.11 % 1.39 % 16 (8+8) 2.50 % 0.68 % 1.91 % 0.81 % 32 (16+16) 1.95 % 0.33 % 1.12 % 0.21 % Fig. 6. FRR versus FAR for open-set speaer identification = 32 (16+16)

5 D. Open-set speaer verification TABLE 5 EER FOR OPEN-SET SPEAKER VERIFICATION 8 (4+4) 7.00 % 1.79 % 3.70 % 1.82 % 16 (8+8) 2.52 % 1.16 % 2.91 % 1.00 % 32 (16+16) 2.06 % 0.58 % 1.43 % 0.52 % Tables (3, 4, 5) and figures (5, 6, 7) summarize the equal error rate (EER) for different networ tpes (RBF or EBF), learning algorithm (EBF-Sample, EBF-Diag, EBF-Full) and networs sizes ( = 8, 16, 32). We can see from this results that for all size of networ, the EBF networs trained b the E algorithm with full covariance matrices (EBF-Full) attain a lower EER (5.75 % in open-set speaer identification, 0.21 % in closed-set speaer verification, and 0.52 % in open-set speaer verification, with 32 centres) as compared to the EBF networs trained b the E algorithm with diagonal covariance matrices, the EBF networs with sample covariance (EBF-Sample) and RBF networs. A comparison of the error rates corresponding to EBF-Diag networs and EBF-Full networs reveals that the diagonal covariance matrices less capable of modeling speaer characteristics than the full covariance matrices. These results demonstrate the capabilit of the E algorithm and the advantage of using full covariance matrices in the basis functions. We can see that for all size of networ, the EBF networs trained with the E algorithm (EBF-Full) attain a lower equal error rate as compared to the EBF networs trained in sample covariance (EBF-Sample). We also note that the equal error rate (EER) decreases when the number of centres increases for all networs. We also compare the time spent for the different networs in training (learning of 20 networs) and recognition phase b using 32 centres per networ (see Table 6). The time of preprocessing is not taen into account in the Table 6. Knowing that the pre-processing of a signal of 3,48 s taes 1 s. TABLE 6 TRAINING AND RECOGNITION TIE WITH = 32 (16+16) PER NETWORK EBF- Sample EBF- Diag EBF- Full Training time (min) Identification time (s) Verification time (s) Fig. 7. FRR versus FAR for open-set speaer verification = 32 (16+16) The training time of the networs varies between 8.06 min and min, the smallest training duration is that of RBF networs, and the greatest training period is that of EBF networs trained b E algorithm with full covariance matrices (EBF-Full). The identification time varies in the interval 0.33 s for RBF networs, and 1.15 s for EBF networs with sample covariance matrices (EBF-Sample) and EBF networs trained b E algorithm with full covariance matrices (EBF-Full). With regard to the verification time, RBF networs and EBF networs trained b E algorithm with diagonal covariance matrices (EBF-Diag) tae 0.06 s, While EBF-Full networs and EBF-Sample networs tae 0.11 s. Knowing that we use the commands tic and toc of atlab to calculate the execution times, with a microprocessor which turns to 2 GHz. VII. CONCLUSION This paper has evaluated the use of s and s for text-independent speaer recognition. The performance b the with full covariance matrices is better than the s, s with diagonal matrices and with sample covariance matrices. The results confirmed the claim b [6] that the use of E algorithm to estimate the parameters of elliptical basis function networs achieves the maximum performances, and illustrates that the full covariance matrices of the EBF networs are capable of providing a better representation of the feature vectors. ACKNOWLEDGENT This wor is conducted in the Electronics Research Laborator of Sida (LRES) and is supported in part b the Algerian inistr of Higher Education and Scientific Research under the CNEPRU project code : J REFERENCES [1] D. O Shaughness, Speaer Recognition, ASSP agazine, IEEE Signal Processing agazine, Vol. 3, No. 4, Part. 1, pp. 4-17, October [2] J. P. Campbell, JR, Speaer Recognition : A Tutorial, Proceedings of the IEEE, Vol. 85, No. 9, September 1997.

6 [3] N. BALASKA, Reconnaissance du locuteur par les méthodes statistiques et connexionnistes: étude comparative, émoire de agistère, Université 20 août 55 of Sida, fev [4] S. Young, G. Evermann, T. Kershaw, G. oore, J. Odell, D. Ollason, D. Pove, V. Valtchev, P. Woodland, The HTK Boo (for HTK Version 3.2.1), Copright Cambridge Universit Engineering Department. [5] D.A. Renolds, Experimental Evaluation of Features for Robust Speaer Identification, IEEE Transactions on Speech and Audio Processing, Vol. 2, No. 4, pp , October [6]. W. a, S. Y. Kung, Estimation of Elliptical Basis Function Parameters b the E Algorithm with Application to Speaer Verification, IEEE Transaction on Neural Networs, Vol. 11, N0. 4, Jul [7] The DARPA TIIT Acoustic- Phonetic Continuous Speech Corpus. [8] D. R. Hush, B. G. Horne, Progress in Supervised Neural Networs, IEEE Signal Processing agazine, Vol. 10, No 1, pp. 8-39, Januar [9] B.. Wilamowsi, Neural Networ Architectures and Learning, International Conference on Industrial Technolog, Vol. 1, pp. TU1- TU12, IEEE, December [10] S. E. Fredricson, L. Tarasseno, Text-Independent Speaer Recognition Using Neural Networ Techniques, Fourth International Conference on Artificial Neural Networs, Conference Publication No. 409, pp , June [11]. W. a, W. G. Allen, G. G. Sexton, Speaer Identification Using Radial Basis Functions, Third International Conference on Artificial Neural Networs, pp , a 1993.

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Robust Speaker Identification

Robust Speaker Identification Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }

More information

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden

More information

IDIAP. Martigny - Valais - Suisse ADJUSTMENT FOR THE COMPENSATION OF MODEL MISMATCH IN SPEAKER VERIFICATION. Frederic BIMBOT + Dominique GENOUD *

IDIAP. Martigny - Valais - Suisse ADJUSTMENT FOR THE COMPENSATION OF MODEL MISMATCH IN SPEAKER VERIFICATION. Frederic BIMBOT + Dominique GENOUD * R E S E A R C H R E P O R T IDIAP IDIAP Martigny - Valais - Suisse LIKELIHOOD RATIO ADJUSTMENT FOR THE COMPENSATION OF MODEL MISMATCH IN SPEAKER VERIFICATION Frederic BIMBOT + Dominique GENOUD * IDIAP{RR

More information

A Generative Model Based Kernel for SVM Classification in Multimedia Applications

A Generative Model Based Kernel for SVM Classification in Multimedia Applications Appears in Neural Information Processing Systems, Vancouver, Canada, 2003. A Generative Model Based Kernel for SVM Classification in Multimedia Applications Pedro J. Moreno Purdy P. Ho Hewlett-Packard

More information

Support Vector Machines using GMM Supervectors for Speaker Verification

Support Vector Machines using GMM Supervectors for Speaker Verification 1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:

More information

How to Deal with Multiple-Targets in Speaker Identification Systems?

How to Deal with Multiple-Targets in Speaker Identification Systems? How to Deal with Multiple-Targets in Speaker Identification Systems? Yaniv Zigel and Moshe Wasserblat ICE Systems Ltd., Audio Analysis Group, P.O.B. 690 Ra anana 4307, Israel yanivz@nice.com Abstract In

More information

Front-End Factor Analysis For Speaker Verification

Front-End Factor Analysis For Speaker Verification IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING Front-End Factor Analysis For Speaker Verification Najim Dehak, Patrick Kenny, Réda Dehak, Pierre Dumouchel, and Pierre Ouellet, Abstract This

More information

Gaussian Mixture Distance for Information Retrieval

Gaussian Mixture Distance for Information Retrieval Gaussian Mixture Distance for Information Retrieval X.Q. Li and I. King fxqli, ingg@cse.cuh.edu.h Department of omputer Science & Engineering The hinese University of Hong Kong Shatin, New Territories,

More information

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Kumari Rambha Ranjan, Kartik Mahto, Dipti Kumari,S.S.Solanki Dept. of Electronics and Communication Birla

More information

FACTORIAL HMMS FOR ACOUSTIC MODELING. Beth Logan and Pedro Moreno

FACTORIAL HMMS FOR ACOUSTIC MODELING. Beth Logan and Pedro Moreno ACTORIAL HMMS OR ACOUSTIC MODELING Beth Logan and Pedro Moreno Cambridge Research Laboratories Digital Equipment Corporation One Kendall Square, Building 700, 2nd loor Cambridge, Massachusetts 02139 United

More information

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic

More information

Eigenvoice Speaker Adaptation via Composite Kernel PCA

Eigenvoice Speaker Adaptation via Composite Kernel PCA Eigenvoice Speaker Adaptation via Composite Kernel PCA James T. Kwok, Brian Mak and Simon Ho Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Hong Kong [jamesk,mak,csho]@cs.ust.hk

More information

Dynamic Time-Alignment Kernel in Support Vector Machine

Dynamic Time-Alignment Kernel in Support Vector Machine Dynamic Time-Alignment Kernel in Support Vector Machine Hiroshi Shimodaira School of Information Science, Japan Advanced Institute of Science and Technology sim@jaist.ac.jp Mitsuru Nakai School of Information

More information

Comparison of Log-Linear Models and Weighted Dissimilarity Measures

Comparison of Log-Linear Models and Weighted Dissimilarity Measures Comparison of Log-Linear Models and Weighted Dissimilarity Measures Daniel Keysers 1, Roberto Paredes 2, Enrique Vidal 2, and Hermann Ney 1 1 Lehrstuhl für Informatik VI, Computer Science Department RWTH

More information

Hand Written Digit Recognition using Kalman Filter

Hand Written Digit Recognition using Kalman Filter International Journal of Electronics and Communication Engineering. ISSN 0974-2166 Volume 5, Number 4 (2012), pp. 425-434 International Research Publication House http://www.irphouse.com Hand Written Digit

More information

Mixtures of Gaussians with Sparse Structure

Mixtures of Gaussians with Sparse Structure Mixtures of Gaussians with Sparse Structure Costas Boulis 1 Abstract When fitting a mixture of Gaussians to training data there are usually two choices for the type of Gaussians used. Either diagonal or

More information

Robust Speaker Identification System Based on Wavelet Transform and Gaussian Mixture Model

Robust Speaker Identification System Based on Wavelet Transform and Gaussian Mixture Model JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 19, 267-282 (2003) Robust Speaer Identification System Based on Wavelet Transform and Gaussian Mixture Model Department of Electrical Engineering Tamang University

More information

Speaker Verification Using Accumulative Vectors with Support Vector Machines

Speaker Verification Using Accumulative Vectors with Support Vector Machines Speaker Verification Using Accumulative Vectors with Support Vector Machines Manuel Aguado Martínez, Gabriel Hernández-Sierra, and José Ramón Calvo de Lara Advanced Technologies Application Center, Havana,

More information

Application of cepstrum and neural network to bearing fault detection

Application of cepstrum and neural network to bearing fault detection Journal of Mechanical Science and Technology 23 (2009) 2730~2737 Journal of Mechanical Science and Technology www.springerlin.com/content/738-494x DOI 0.007/s2206-009-0802-9 Application of cepstrum and

More information

Presented By: Omer Shmueli and Sivan Niv

Presented By: Omer Shmueli and Sivan Niv Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan

More information

Estimation of the wrist torque of robot gripper using data fusion and ANN techniques

Estimation of the wrist torque of robot gripper using data fusion and ANN techniques Estimation of the wrist torque of robot gripper using data fusion and ANN techniques Wu Ting 1 Tang Xue-hua 1 Li Zhu 1 School of Mechanical Engineering Shanghai Dianji Universit Shanghai 0040 China School

More information

TinySR. Peter Schmidt-Nielsen. August 27, 2014

TinySR. Peter Schmidt-Nielsen. August 27, 2014 TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Comparing linear and non-linear transformation of speech

Comparing linear and non-linear transformation of speech Comparing linear and non-linear transformation of speech Larbi Mesbahi, Vincent Barreaud and Olivier Boeffard IRISA / ENSSAT - University of Rennes 1 6, rue de Kerampont, Lannion, France {lmesbahi, vincent.barreaud,

More information

Usually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However,

Usually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However, Odyssey 2012 The Speaker and Language Recognition Workshop 25-28 June 2012, Singapore First attempt of Boltzmann Machines for Speaker Verification Mohammed Senoussaoui 1,2, Najim Dehak 3, Patrick Kenny

More information

GMM-Based Speech Transformation Systems under Data Reduction

GMM-Based Speech Transformation Systems under Data Reduction GMM-Based Speech Transformation Systems under Data Reduction Larbi Mesbahi, Vincent Barreaud, Olivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-22305 Lannion Cedex

More information

Intelligent Handwritten Digit Recognition using Artificial Neural Network

Intelligent Handwritten Digit Recognition using Artificial Neural Network RESEARCH ARTICLE OPEN ACCESS Intelligent Handwritten Digit Recognition using Artificial Neural Networ Saeed AL-Mansoori Applications Development and Analysis Center (ADAC), Mohammed Bin Rashid Space Center

More information

Reconnaissance d objetsd et vision artificielle

Reconnaissance d objetsd et vision artificielle Reconnaissance d objetsd et vision artificielle http://www.di.ens.fr/willow/teaching/recvis09 Lecture 6 Face recognition Face detection Neural nets Attention! Troisième exercice de programmation du le

More information

Graphic Representation Method and Neural Network Recognition of Time-Frequency Vectors of Speech Information

Graphic Representation Method and Neural Network Recognition of Time-Frequency Vectors of Speech Information Programming and Computer Software, Vol. 29, No. 4, 2003, pp. 210 218. Translated from Programmirovanie, Vol. 29, No. 4, 2003. Original Russian Text Copright 2003 b Zhirkov, Kortchagine, Lukin, Krlov, Baakovskii.

More information

Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems

Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems Chin-Hung Sit 1, Man-Wai Mak 1, and Sun-Yuan Kung 2 1 Center for Multimedia Signal Processing Dept. of

More information

Fuzzy Support Vector Machines for Automatic Infant Cry Recognition

Fuzzy Support Vector Machines for Automatic Infant Cry Recognition Fuzzy Support Vector Machines for Automatic Infant Cry Recognition Sandra E. Barajas-Montiel and Carlos A. Reyes-García Instituto Nacional de Astrofisica Optica y Electronica, Luis Enrique Erro #1, Tonantzintla,

More information

Biometrics: Introduction and Examples. Raymond Veldhuis

Biometrics: Introduction and Examples. Raymond Veldhuis Biometrics: Introduction and Examples Raymond Veldhuis 1 Overview Biometric recognition Face recognition Challenges Transparent face recognition Large-scale identification Watch list Anonymous biometrics

More information

Sound Recognition in Mixtures

Sound Recognition in Mixtures Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems

More information

Exemplar-based voice conversion using non-negative spectrogram deconvolution

Exemplar-based voice conversion using non-negative spectrogram deconvolution Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore

More information

COMPARISON STUDY OF SENSITIVITY DEFINITIONS OF NEURAL NETWORKS

COMPARISON STUDY OF SENSITIVITY DEFINITIONS OF NEURAL NETWORKS COMPARISON SUDY OF SENSIIVIY DEFINIIONS OF NEURAL NEORKS CHUN-GUO LI 1, HAI-FENG LI 1 Machine Learning Center, Facult of Mathematics and Computer Science, Hebei Universit, Baoding 07100, China Department

More information

BIOMETRIC verification systems are used to verify the

BIOMETRIC verification systems are used to verify the 86 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 1, JANUARY 2004 Likelihood-Ratio-Based Biometric Verification Asker M. Bazen and Raymond N. J. Veldhuis Abstract This paper

More information

6 Quantization of Discrete Time Signals

6 Quantization of Discrete Time Signals Ramachandran, R.P. Quantization of Discrete Time Signals Digital Signal Processing Handboo Ed. Vijay K. Madisetti and Douglas B. Williams Boca Raton: CRC Press LLC, 1999 c 1999byCRCPressLLC 6 Quantization

More information

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Monaural speech separation using source-adapted models

Monaural speech separation using source-adapted models Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications

More information

ISCA Archive

ISCA Archive ISCA Archive http://www.isca-speech.org/archive ODYSSEY04 - The Speaker and Language Recognition Workshop Toledo, Spain May 3 - June 3, 2004 Analysis of Multitarget Detection for Speaker and Language Recognition*

More information

Joint Factor Analysis for Speaker Verification

Joint Factor Analysis for Speaker Verification Joint Factor Analysis for Speaker Verification Mengke HU ASPITRG Group, ECE Department Drexel University mengke.hu@gmail.com October 12, 2012 1/37 Outline 1 Speaker Verification Baseline System Session

More information

Classification and Clustering of Printed Mathematical Symbols with Improved Backpropagation and Self-Organizing Map

Classification and Clustering of Printed Mathematical Symbols with Improved Backpropagation and Self-Organizing Map BULLETIN Bull. Malaysian Math. Soc. (Second Series) (1999) 157-167 of the MALAYSIAN MATHEMATICAL SOCIETY Classification and Clustering of Printed Mathematical Symbols with Improved Bacpropagation and Self-Organizing

More information

Correspondence. Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure

Correspondence. Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure Correspondence Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure It is possible to detect and classify moving and stationary targets using ground surveillance pulse-doppler radars

More information

Wavelet Transform in Speech Segmentation

Wavelet Transform in Speech Segmentation Wavelet Transform in Speech Segmentation M. Ziółko, 1 J. Gałka 1 and T. Drwięga 2 1 Department of Electronics, AGH University of Science and Technology, Kraków, Poland, ziolko@agh.edu.pl, jgalka@agh.edu.pl

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Allpass Modeling of LP Residual for Speaker Recognition

Allpass Modeling of LP Residual for Speaker Recognition Allpass Modeling of LP Residual for Speaker Recognition K. Sri Rama Murty, Vivek Boominathan and Karthika Vijayan Department of Electrical Engineering, Indian Institute of Technology Hyderabad, India email:

More information

Lecture 5: GMM Acoustic Modeling and Feature Extraction

Lecture 5: GMM Acoustic Modeling and Feature Extraction CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic

More information

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

ECE662: Pattern Recognition and Decision Making Processes: HW TWO ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are

More information

SCORE CALIBRATING FOR SPEAKER RECOGNITION BASED ON SUPPORT VECTOR MACHINES AND GAUSSIAN MIXTURE MODELS

SCORE CALIBRATING FOR SPEAKER RECOGNITION BASED ON SUPPORT VECTOR MACHINES AND GAUSSIAN MIXTURE MODELS SCORE CALIBRATING FOR SPEAKER RECOGNITION BASED ON SUPPORT VECTOR MACHINES AND GAUSSIAN MIXTURE MODELS Marcel Katz, Martin Schafföner, Sven E. Krüger, Andreas Wendemuth IESK-Cognitive Systems University

More information

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes Mixtures of Gaussians with Sparse Regression Matrices Constantinos Boulis, Jeffrey Bilmes {boulis,bilmes}@ee.washington.edu Dept of EE, University of Washington Seattle WA, 98195-2500 UW Electrical Engineering

More information

Tone Analysis in Harmonic-Frequency Domain and Feature Reduction using KLT+LVQ for Thai Isolated Word Recognition

Tone Analysis in Harmonic-Frequency Domain and Feature Reduction using KLT+LVQ for Thai Isolated Word Recognition Tone Analysis in Harmonic-Frequency Domain and Feature Reduction using KLT+LVQ for Thai Isolated Word Recognition SARITCHAI PREDAWAN 1 PRASIT JIYAPANICHKUL 2 and CHOM KIMPAN 3 Faculty of Information Technology

More information

Symmetric Distortion Measure for Speaker Recognition

Symmetric Distortion Measure for Speaker Recognition ISCA Archive http://www.isca-speech.org/archive SPECOM 2004: 9 th Conference Speech and Computer St. Petersburg, Russia September 20-22, 2004 Symmetric Distortion Measure for Speaker Recognition Evgeny

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Text Independent Speaker Identification Using Imfcc Integrated With Ica

Text Independent Speaker Identification Using Imfcc Integrated With Ica IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 7, Issue 5 (Sep. - Oct. 2013), PP 22-27 ext Independent Speaker Identification Using Imfcc

More information

Kernel Based Text-Independnent Speaker Verification

Kernel Based Text-Independnent Speaker Verification 12 Kernel Based Text-Independnent Speaker Verification Johnny Mariéthoz 1, Yves Grandvalet 1 and Samy Bengio 2 1 IDIAP Research Institute, Martigny, Switzerland 2 Google Inc., Mountain View, CA, USA The

More information

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY [Gaurav, 2(1): Jan., 2013] ISSN: 2277-9655 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY Face Identification & Detection Using Eigenfaces Sachin.S.Gurav *1, K.R.Desai 2 *1

More information

IBM Research Report. Training Universal Background Models for Speaker Recognition

IBM Research Report. Training Universal Background Models for Speaker Recognition RC24953 (W1003-002) March 1, 2010 Other IBM Research Report Training Universal Bacground Models for Speaer Recognition Mohamed Kamal Omar, Jason Pelecanos IBM Research Division Thomas J. Watson Research

More information

Application of Fully Recurrent (FRNN) and Radial Basis Function (RBFNN) Neural Networks for Simulating Solar Radiation

Application of Fully Recurrent (FRNN) and Radial Basis Function (RBFNN) Neural Networks for Simulating Solar Radiation Bulletin of Environment, Pharmacology and Life Sciences Bull. Env. Pharmacol. Life Sci., Vol 3 () January 04: 3-39 04 Academy for Environment and Life Sciences, India Online ISSN 77-808 Journal s URL:http://www.bepls.com

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single

More information

QUADRATIC AND CONVEX MINIMAX CLASSIFICATION PROBLEMS

QUADRATIC AND CONVEX MINIMAX CLASSIFICATION PROBLEMS Journal of the Operations Research Societ of Japan 008, Vol. 51, No., 191-01 QUADRATIC AND CONVEX MINIMAX CLASSIFICATION PROBLEMS Tomonari Kitahara Shinji Mizuno Kazuhide Nakata Toko Institute of Technolog

More information

Face Recognition Using Eigenfaces

Face Recognition Using Eigenfaces Face Recognition Using Eigenfaces Prof. V.P. Kshirsagar, M.R.Baviskar, M.E.Gaikwad, Dept. of CSE, Govt. Engineering College, Aurangabad (MS), India. vkshirsagar@gmail.com, madhumita_baviskar@yahoo.co.in,

More information

Hidden Markov Model and Speech Recognition

Hidden Markov Model and Speech Recognition 1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep

More information

Role of Assembling Invariant Moments and SVM in Fingerprint Recognition

Role of Assembling Invariant Moments and SVM in Fingerprint Recognition 56 Role of Assembling Invariant Moments SVM in Fingerprint Recognition 1 Supriya Wable, 2 Chaitali Laulkar 1, 2 Department of Computer Engineering, University of Pune Sinhgad College of Engineering, Pune-411

More information

Modeling the creaky excitation for parametric speech synthesis.

Modeling the creaky excitation for parametric speech synthesis. Modeling the creaky excitation for parametric speech synthesis. 1 Thomas Drugman, 2 John Kane, 2 Christer Gobl September 11th, 2012 Interspeech Portland, Oregon, USA 1 University of Mons, Belgium 2 Trinity

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.

More information

An artificial neural networks (ANNs) model is a functional abstraction of the

An artificial neural networks (ANNs) model is a functional abstraction of the CHAPER 3 3. Introduction An artificial neural networs (ANNs) model is a functional abstraction of the biological neural structures of the central nervous system. hey are composed of many simple and highly

More information

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction

More information

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Olga Kouropteva, Oleg Okun, Matti Pietikäinen Machine Vision Group, Infotech Oulu and

More information

Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition

Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition Jorge Silva and Shrikanth Narayanan Speech Analysis and Interpretation

More information

Multi-Layer Boosting for Pattern Recognition

Multi-Layer Boosting for Pattern Recognition Multi-Layer Boosting for Pattern Recognition François Fleuret IDIAP Research Institute, Centre du Parc, P.O. Box 592 1920 Martigny, Switzerland fleuret@idiap.ch Abstract We extend the standard boosting

More information

Classification and Pattern Recognition

Classification and Pattern Recognition Classification and Pattern Recognition Léon Bottou NEC Labs America COS 424 2/23/2010 The machine learning mix and match Goals Representation Capacity Control Operational Considerations Computational Considerations

More information

Speaker recognition by means of Deep Belief Networks

Speaker recognition by means of Deep Belief Networks Speaker recognition by means of Deep Belief Networks Vasileios Vasilakakis, Sandro Cumani, Pietro Laface, Politecnico di Torino, Italy {first.lastname}@polito.it 1. Abstract Most state of the art speaker

More information

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification

More information

Application of hopfield network in improvement of fingerprint recognition process Mahmoud Alborzi 1, Abbas Toloie- Eshlaghy 1 and Dena Bazazian 2

Application of hopfield network in improvement of fingerprint recognition process Mahmoud Alborzi 1, Abbas Toloie- Eshlaghy 1 and Dena Bazazian 2 5797 Available online at www.elixirjournal.org Computer Science and Engineering Elixir Comp. Sci. & Engg. 41 (211) 5797-582 Application hopfield network in improvement recognition process Mahmoud Alborzi

More information

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models. Say Wei Foo, Yong Lian, Liang Dong. IEEE Transactions on Circuits and Systems for Video Technology, May 2004. Shankar

More information

A Small Footprint i-vector Extractor

A Small Footprint i-vector Extractor A Small Footprint i-vector Extractor Patrick Kenny Odyssey Speaker and Language Recognition Workshop June 25, 2012 1 / 25 Patrick Kenny A Small Footprint i-vector Extractor Outline Introduction Review

More information

Comparing Robustness of Pairwise and Multiclass Neural-Network Systems for Face Recognition

Comparing Robustness of Pairwise and Multiclass Neural-Network Systems for Face Recognition Comparing Robustness of Pairwise and Multiclass Neural-Network Systems for Face Recognition J. Uglov, V. Schetinin, C. Maple Computing and Information System Department, University of Bedfordshire, Luton,

More information

arxiv: v1 [cs.sd] 25 Oct 2014

arxiv: v1 [cs.sd] 25 Oct 2014 Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech arxiv:1410.6903v1 [cs.sd] 25 Oct 2014 Laxmi Narayana M, Sunil Kumar Kopparapu TCS Innovation Lab - Mumbai, Tata Consultancy Services, Yantra

More information

Generalized Cyclic Transformations in Speaker-Independent Speech Recognition

Generalized Cyclic Transformations in Speaker-Independent Speech Recognition Generalized Cyclic Transformations in Speaker-Independent Speech Recognition Florian Müller 1, Eugene Belilovsky, and Alfred Mertins Institute for Signal Processing, University of Lübeck Ratzeburger Allee

More information

EUSIPCO

EUSIPCO EUSIPCO 3 569736677 FULLY ISTRIBUTE SIGNAL ETECTION: APPLICATION TO COGNITIVE RAIO Franc Iutzeler Philippe Ciblat Telecom ParisTech, 46 rue Barrault 753 Paris, France email: firstnamelastname@telecom-paristechfr

More information

Hierarchical Multi-Stream Posterior Based Speech Recognition System

Hierarchical Multi-Stream Posterior Based Speech Recognition System Hierarchical Multi-Stream Posterior Based Speech Recognition System Hamed Ketabdar 1,2, Hervé Bourlard 1,2 and Samy Bengio 1 1 IDIAP Research Institute, Martigny, Switzerland 2 Ecole Polytechnique Fédérale

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short

More information

SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS

SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS by Jinjin Ye, B.S. A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS for

More information

Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer

Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer Gábor Gosztolya, András Kocsor Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Score calibration for optimal biometric identification

Score calibration for optimal biometric identification Score calibration for optimal biometric identification (see also NIST IBPC 2010 online proceedings: http://biometrics.nist.gov/ibpc2010) AI/GI/CRV 2010, Ottawa Dmitry O. Gorodnichy Head of Video Surveillance

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

Modifying Voice Activity Detection in Low SNR by correction factors

Modifying Voice Activity Detection in Low SNR by correction factors Modifying Voice Activity Detection in Low SNR by correction factors H. Farsi, M. A. Mozaffarian, H.Rahmani Department of Electrical Engineering University of Birjand P.O. Box: +98-9775-376 IRAN hfarsi@birjand.ac.ir

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

ECE 661: Homework 10 Fall 2014

ECE 661: Homework 10 Fall 2014 ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;

More information

GMM Vector Quantization on the Modeling of DHMM for Arabic Isolated Word Recognition System

GMM Vector Quantization on the Modeling of DHMM for Arabic Isolated Word Recognition System GMM Vector Quantization on the Modeling of DHMM for Arabic Isolated Word Recognition System Snani Cherifa 1, Ramdani Messaoud 1, Zermi Narima 1, Bourouba Houcine 2 1 Laboratoire d Automatique et Signaux

More information

Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data

Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data Juan Torres 1, Ashraf Saad 2, Elliot Moore 1 1 School of Electrical and Computer

More information

Multimodal Biometric Fusion Joint Typist (Keystroke) and Speaker Verification

Multimodal Biometric Fusion Joint Typist (Keystroke) and Speaker Verification Multimodal Biometric Fusion Joint Typist (Keystroke) and Speaker Verification Jugurta R. Montalvão Filho and Eduardo O. Freire Abstract Identity verification through fusion of features from keystroke dynamics

More information

ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM

ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM Mayukh Bhaowal and Kunal Chawla (Students)Indian Institute of Information Technology, Allahabad, India Abstract: Key words: Speech recognition

More information

Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics

Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Phil Woodland: pcw@eng.cam.ac.uk Lent 2013 Engineering Part IIB: Module 4F11 What is Speech Recognition?

More information