Uncertainty Modeling without Subspace Methods for Text-Dependent Speaker Recognition

Size: px
Start display at page:

Download "Uncertainty Modeling without Subspace Methods for Text-Dependent Speaker Recognition"

Transcription

1 Uncertainty Modeling without Subspace Methods for Text-Dependent Speaker Recognition Patrick Kenny, Themos Stafylakis, Md. Jahangir Alam and Marcel Kockmann Odyssey Speaker and Language Recognition Workshop Bilbao, Spain June, / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

2 Uncertainty Modeling in Text-Dependent Speaker Recognition Large numbers of mixture components are surprisingly effective in text-dependent speaker recognition where utterances are typically of 1 or 2 seconds duration The number of times a mixture component is observed typically << 1 and it could be 0 (particularly at test time) so observations ought to be treated as being noisy in the statistical sense Some progress has been made in uncertainty modeling in text-independent speaker recognition with subspace methods (i-vectors, speaker factors) but these are of limited use in text-dependent speaker recognition We tackle the problem of uncertainty modeling without resorting to subspace methods 2 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

3 RSR2015 Part III (Random Digits) Background set (97 speakers) used for JFA and backend training Results reported on development set Enrollment consists of 3 utterances of the 10 digits in random order Each test utterance consists of a random string of 5 digits Error rates are much higher than on Part I Counterintuitively, it is hard to beat a naive GMM/UBM benchmark using HMMs We focus on backend modeling with a standard 60-dimensional PLP front end 3 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

4 JFA for Speaker Recognition with Digits Given a speaker and a collection of enrollment recordings, the recordings are modeled by supervectors of the form m + Ux r + Dz (1) Speakers are characterized by z-vectors (supervector sized); the x-vectors (low-dimensional) model channel effects To perform speaker recognition, for each digit d in a test utterance compare the vectors supervectors z e and z t where z e is extracted from the enrollment utterances z t is extracted from the test utterance z vectors may be digit-independent (global) or digit-dependent (local) 4 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

5 The Joint Density Backend uses point estimates of z e and z t The Hidden Supervector Backend treats z e and z t as latent variables. Inference requires Baum-Welch statistics A joint prior distribution (under the same-speaker hypothesis) P(w) where w = (z e, z t ) Calculating the posterior of w given Baum-Welch statistics 5 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

6 Joint Density Backend The joint distribution for target trials, P T (z e, z t ), is modeled by a Gaussian for each mixture component Insufficient data to train full covariance Gaussians and diagonal Gaussians obviously incorrect Semi-diagonal constraints (see paper) Gaussians estimated by arranging the background set into a collection of target trials For non-target trials, assume statistical independence, i.e. P N (z e, z t ) = P T (z e ) P T (z t ) Likelihood ratio for speaker verification: PT (z e, z t ) P N (z e, z t ) where the product ranges over the digits in the test utterance and mixture components in the UBM 6 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

7 Hidden Supervector Backend For each mixture component treat z e, z t as a pair of hidden mean vectors which are correlated in the case of a target trial Use an i-vector extractor to do probability calculations (not to extract factors) The i-vector w is the pair z e, z t so its dimension is twice that of the acoustic feature vectors The i-vector model has full rank so we can take the total variability matrix to be the identity and shift the burden of modeling the correlation between z e and z t to the prior The prior cannot be standard normal so it needs to be estimated 7 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

8 Posterior Calculations For an i-vector extractor with a non-standard prior, ( Cov(w, w) = P + ) 1 N c T c T c c ( w = Cov(w, w) Pµ + ) T c F c c where µ is the prior expectation and P the precision. (In the standard case, µ = 0 and P = I.) 8 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

9 Minimum Divergence Estimation of the Prior We need to supply the mean µ and precision matrix P that specifies the prior distribution of i-vectors for same-speaker trials. Arrange the background set into a collection of target trials indexed by s = 1,..., S and let w(s) be the i-vector for trial s. µ = 1 w(s) S s P 1 = 1 w(s)w (s) µµ S s Minor modifications to make µ and P digit dependent or impose semi-diagonal constraints. 9 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

10 For the different speaker hypothesis, treat z e and z t as being statistically independent. In other words, suppress the cross correlations in the covariance matrix P 1 that defines the prior under the same-speaker hypothesis. 10 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

11 Likelihood Ratio Given data and a probability model with hidden variables, the evidence is the likelihood of the data calculated by integrating out the hidden variables For an i-vector model the integral can be evaluated in closed form (it is a Gaussian integral) and expressed in terms of the Baum-Welch statistics (see paper) To evaluate the likelihood ratio for a speaker verification trial, evaluate the evidence twice Using the prior for the same-speaker hypothesis Using the prior for the different speaker hypothesis 11 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

12 Preparing the Baum-Welch Statistics For each speaker, we have a collection of (enrollment or test) recordings indexed by r For each mixture component c, zero and first order statistics denoted by N r c and F r c Remove the channel effects from each recording and pool over recordings N c = r F c = r N r c (F r c N r cu c x r ) x r is a point-estimate of the hidden variable x r in (1) One set of synthetic statistics per speaker (regardless of the number of recordings) 12 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

13 Length Normalization of the Synthetic Statistics In the JFA model (1), z c is a hidden variable The posterior covariance and expectation C c and z c, are given by C c = (I + N c D cd c ) 1 z c = C c D cf c so that z c 2 = z c 2 + trace(c c ) For each speaker, we scale the synthetic first order statistics so that c zc 2 is the same for all speakers 13 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

14 The dominant term in (2) is trace(c c ) An experiment in the Appendix A demonstrates its usefulness The posterior covariance matrix C c depends critically on the relevance factor 14 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

15 128 Mixture Components, Global z-vectors norm.? EER (M/F) DCF (M/F) 1 GMM - 4.8%/8.0% 0.217/ JDB - 4.8%/7.6% 0.219/ HSB 4.5%/6.8% 0.201/ HSB 3.9%/6.1% 0.177/0.307 Table 1: Results on the development set obtained with 128 Gaussians. The systems are a GMM/UBM system, the Joint Density Backend (JDB) and the Hidden Supervector Backend (HSB) both with global z-vectors. Baum-Welch statistics normalization is indicated by norm. 15 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

16 512 Components, Global z-vectors r EER (M/F) DCF (M/F) 1 GMM 2 4.7%/8.2% 0.195/ JDB 2 4.3%/6.1% 0.196/ HSB 1 3.3%/4.6% 0.148/0.234 Table 2: Results on the development set obtained with 512 Gaussians and global z-vectors. 16 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

17 512 Components, Local z-vectors EER (M/F) DCF (M/F) JDB (component fusion) 3.9%/5.2% 0.184/0.259 HSB (component fusion) 3.6%/3.9% 0.152/0.197 HSB (forced alignment) 3.5%/4.0% 0.152/0.197 Table 3: Results on the development set obtained with 512 Gaussians, local z-vectors and digit-dependent backends 17 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

18 Fusion of Local and Global EER (M/F) DCF (M/F) dev local 3.7%/3.8% 0.149/0.193 dev global 3.2%/4.5% 0.148/0.232 dev fusion 2.9%/3.6% 0.131/0.186 eval local 2.6%/4.5% 0.134/0.211 eval global 2.7%/4.7% 0.140/0.236 eval fusion 2.3%/4.0% 0.122/0.192 Table 4: Results on the development and evaluation sets obtained with local and global Hidden Supervector systems, 512 components. 18 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

19 Conclusion Modeling uncertainty yields error rate reductions of up to 25% compared with the Joint Density Backend, consistently across all experiments on the RSR Part III task This can be achieved without resorting to subspace methods although the idea can be seen as applying the same idea as the I-Vector Backend (Interspeech 2015) at the level of individual mixture components Unlike the I-Vector backend, the Hidden Supervector Backend can be configured in a way which makes very modest computational demands With semi-diagonal constraints on the prior, the run-time linear algebra involves only diagonal matrices 19 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

A Small Footprint i-vector Extractor

A Small Footprint i-vector Extractor A Small Footprint i-vector Extractor Patrick Kenny Odyssey Speaker and Language Recognition Workshop June 25, 2012 1 / 25 Patrick Kenny A Small Footprint i-vector Extractor Outline Introduction Review

More information

Joint Factor Analysis for Speaker Verification

Joint Factor Analysis for Speaker Verification Joint Factor Analysis for Speaker Verification Mengke HU ASPITRG Group, ECE Department Drexel University mengke.hu@gmail.com October 12, 2012 1/37 Outline 1 Speaker Verification Baseline System Session

More information

i-vector and GMM-UBM Bie Fanhu CSLT, RIIT, THU

i-vector and GMM-UBM Bie Fanhu CSLT, RIIT, THU i-vector and GMM-UBM Bie Fanhu CSLT, RIIT, THU 2013-11-18 Framework 1. GMM-UBM Feature is extracted by frame. Number of features are unfixed. Gaussian Mixtures are used to fit all the features. The mixtures

More information

Session Variability Compensation in Automatic Speaker Recognition

Session Variability Compensation in Automatic Speaker Recognition Session Variability Compensation in Automatic Speaker Recognition Javier González Domínguez VII Jornadas MAVIR Universidad Autónoma de Madrid November 2012 Outline 1. The Inter-session Variability Problem

More information

Front-End Factor Analysis For Speaker Verification

Front-End Factor Analysis For Speaker Verification IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING Front-End Factor Analysis For Speaker Verification Najim Dehak, Patrick Kenny, Réda Dehak, Pierre Dumouchel, and Pierre Ouellet, Abstract This

More information

Modified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition System

Modified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition System INERSPEECH 2015 Modified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition System QingYang Hong 1, Lin Li 1, Ming Li 2, Ling Huang 1, Lihong Wan 1, Jun Zhang 1

More information

Speaker recognition by means of Deep Belief Networks

Speaker recognition by means of Deep Belief Networks Speaker recognition by means of Deep Belief Networks Vasileios Vasilakakis, Sandro Cumani, Pietro Laface, Politecnico di Torino, Italy {first.lastname}@polito.it 1. Abstract Most state of the art speaker

More information

Speaker Verification Using Accumulative Vectors with Support Vector Machines

Speaker Verification Using Accumulative Vectors with Support Vector Machines Speaker Verification Using Accumulative Vectors with Support Vector Machines Manuel Aguado Martínez, Gabriel Hernández-Sierra, and José Ramón Calvo de Lara Advanced Technologies Application Center, Havana,

More information

An Integration of Random Subspace Sampling and Fishervoice for Speaker Verification

An Integration of Random Subspace Sampling and Fishervoice for Speaker Verification Odyssey 2014: The Speaker and Language Recognition Workshop 16-19 June 2014, Joensuu, Finland An Integration of Random Subspace Sampling and Fishervoice for Speaker Verification Jinghua Zhong 1, Weiwu

More information

Usually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However,

Usually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However, Odyssey 2012 The Speaker and Language Recognition Workshop 25-28 June 2012, Singapore First attempt of Boltzmann Machines for Speaker Verification Mohammed Senoussaoui 1,2, Najim Dehak 3, Patrick Kenny

More information

Gain Compensation for Fast I-Vector Extraction over Short Duration

Gain Compensation for Fast I-Vector Extraction over Short Duration INTERSPEECH 27 August 2 24, 27, Stockholm, Sweden Gain Compensation for Fast I-Vector Extraction over Short Duration Kong Aik Lee and Haizhou Li 2 Institute for Infocomm Research I 2 R), A STAR, Singapore

More information

Support Vector Machines using GMM Supervectors for Speaker Verification

Support Vector Machines using GMM Supervectors for Speaker Verification 1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:

More information

Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-vectors

Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-vectors Published in Ninth International Conference on Advances in Pattern Recognition (ICAPR-2017), Bangalore, India Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-vectors

More information

An I-Vector Backend for Speaker Verification

An I-Vector Backend for Speaker Verification An I-Vetor Bakend for Speaker Verifiation Patrik Kenny, 1 Themos Stafylakis, 1 Jahangir Alam, 1 and Marel Kokmann 2 1 CRIM, Canada, {patrik.kenny, themos.stafylakis, jahangir.alam}@rim.a 2 VoieTrust, Canada,

More information

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Statistical NLP Spring The Noisy Channel Model

Statistical NLP Spring The Noisy Channel Model Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Multiclass Discriminative Training of i-vector Language Recognition

Multiclass Discriminative Training of i-vector Language Recognition Odyssey 214: The Speaker and Language Recognition Workshop 16-19 June 214, Joensuu, Finland Multiclass Discriminative Training of i-vector Language Recognition Alan McCree Human Language Technology Center

More information

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()

More information

Minimax i-vector extractor for short duration speaker verification

Minimax i-vector extractor for short duration speaker verification Minimax i-vector extractor for short duration speaker verification Ville Hautamäki 1,2, You-Chi Cheng 2, Padmanabhan Rajan 1, Chin-Hui Lee 2 1 School of Computing, University of Eastern Finl, Finl 2 ECE,

More information

Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data

Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data Distribution A: Public Release Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data Bengt J. Borgström Elliot Singer Douglas Reynolds and Omid Sadjadi 2

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Unifying Probabilistic Linear Discriminant Analysis Variants in Biometric Authentication

Unifying Probabilistic Linear Discriminant Analysis Variants in Biometric Authentication Unifying Probabilistic Linear Discriminant Analysis Variants in Biometric Authentication Aleksandr Sizov 1, Kong Aik Lee, Tomi Kinnunen 1 1 School of Computing, University of Eastern Finland, Finland Institute

More information

A Generative Model for Score Normalization in Speaker Recognition

A Generative Model for Score Normalization in Speaker Recognition INTERSPEECH 017 August 0 4, 017, Stockholm, Sweden A Generative Model for Score Normalization in Speaker Recognition Albert Swart and Niko Brümmer Nuance Communications, Inc. (South Africa) albert.swart@nuance.com,

More information

Bayesian Analysis of Speaker Diarization with Eigenvoice Priors

Bayesian Analysis of Speaker Diarization with Eigenvoice Priors Bayesian Analysis of Speaker Diarization with Eigenvoice Priors Patrick Kenny Centre de recherche informatique de Montréal Patrick.Kenny@crim.ca A year in the lab can save you a day in the library. Panu

More information

Towards Duration Invariance of i-vector-based Adaptive Score Normalization

Towards Duration Invariance of i-vector-based Adaptive Score Normalization Odyssey 204: The Speaker and Language Recognition Workshop 6-9 June 204, Joensuu, Finland Towards Duration Invariance of i-vector-based Adaptive Score Normalization Andreas Nautsch*, Christian Rathgeb,

More information

SPEECH recognition systems based on hidden Markov

SPEECH recognition systems based on hidden Markov IEEE SIGNAL PROCESSING LETTERS, VOL. X, NO. X, 2014 1 Probabilistic Linear Discriminant Analysis for Acoustic Modelling Liang Lu, Member, IEEE and Steve Renals, Fellow, IEEE Abstract In this letter, we

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Hidden Markov Modelling

Hidden Markov Modelling Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

] Automatic Speech Recognition (CS753)

] Automatic Speech Recognition (CS753) ] Automatic Speech Recognition (CS753) Lecture 17: Discriminative Training for HMMs Instructor: Preethi Jyothi Sep 28, 2017 Discriminative Training Recall: MLE for HMMs Maximum likelihood estimation (MLE)

More information

ISCA Archive

ISCA Archive ISCA Archive http://www.isca-speech.org/archive ODYSSEY04 - The Speaker and Language Recognition Workshop Toledo, Spain May 3 - June 3, 2004 Analysis of Multitarget Detection for Speaker and Language Recognition*

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

Monaural speech separation using source-adapted models

Monaural speech separation using source-adapted models Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications

More information

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II)

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II) Speech and Language Processing Chapter 9 of SLP Automatic Speech Recognition (II) Outline for ASR ASR Architecture The Noisy Channel Model Five easy pieces of an ASR system 1) Language Model 2) Lexicon/Pronunciation

More information

SCORE CALIBRATING FOR SPEAKER RECOGNITION BASED ON SUPPORT VECTOR MACHINES AND GAUSSIAN MIXTURE MODELS

SCORE CALIBRATING FOR SPEAKER RECOGNITION BASED ON SUPPORT VECTOR MACHINES AND GAUSSIAN MIXTURE MODELS SCORE CALIBRATING FOR SPEAKER RECOGNITION BASED ON SUPPORT VECTOR MACHINES AND GAUSSIAN MIXTURE MODELS Marcel Katz, Martin Schafföner, Sven E. Krüger, Andreas Wendemuth IESK-Cognitive Systems University

More information

speaker recognition using gmm-ubm semester project presentation

speaker recognition using gmm-ubm semester project presentation speaker recognition using gmm-ubm semester project presentation OBJECTIVES OF THE PROJECT study the GMM-UBM speaker recognition system implement this system with matlab document the code and how it interfaces

More information

Statistical NLP Spring Digitizing Speech

Statistical NLP Spring Digitizing Speech Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon

More information

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ... Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield

More information

Lecture 3: ASR: HMMs, Forward, Viterbi

Lecture 3: ASR: HMMs, Forward, Viterbi Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions

More information

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY MengSun,HugoVanhamme Department of Electrical Engineering-ESAT, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Bus

More information

Linear Dynamical Systems (Kalman filter)

Linear Dynamical Systems (Kalman filter) Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete

More information

Using Deep Belief Networks for Vector-Based Speaker Recognition

Using Deep Belief Networks for Vector-Based Speaker Recognition INTERSPEECH 2014 Using Deep Belief Networks for Vector-Based Speaker Recognition W. M. Campbell MIT Lincoln Laboratory, Lexington, MA, USA wcampbell@ll.mit.edu Abstract Deep belief networks (DBNs) have

More information

Unsupervised Methods for Speaker Diarization. Stephen Shum

Unsupervised Methods for Speaker Diarization. Stephen Shum Unsupervised Methods for Speaker Diarization by Stephen Shum B.S., University of California, Berkeley (2009) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment

More information

Lecture 5: GMM Acoustic Modeling and Feature Extraction

Lecture 5: GMM Acoustic Modeling and Feature Extraction CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic

More information

INTERSPEECH 2016 Tutorial: Machine Learning for Speaker Recognition

INTERSPEECH 2016 Tutorial: Machine Learning for Speaker Recognition INTERSPEECH 2016 Tutorial: Machine Learning for Speaker Recognition Man-Wai Mak and Jen-Tzung Chien The Hong Kong Polytechnic University, Hong Kong National Chiao Tung University, Taiwan September 8, 2016

More information

Low-dimensional speech representation based on Factor Analysis and its applications!

Low-dimensional speech representation based on Factor Analysis and its applications! Low-dimensional speech representation based on Factor Analysis and its applications! Najim Dehak and Stephen Shum! Spoken Language System Group! MIT Computer Science and Artificial Intelligence Laboratory!

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Around the Speaker De-Identification (Speaker diarization for de-identification ++) Itshak Lapidot Moez Ajili Jean-Francois Bonastre

Around the Speaker De-Identification (Speaker diarization for de-identification ++) Itshak Lapidot Moez Ajili Jean-Francois Bonastre Around the Speaker De-Identification (Speaker diarization for de-identification ++) Itshak Lapidot Moez Ajili Jean-Francois Bonastre The 2 Parts HDM based diarization System The homogeneity measure 2 Outline

More information

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,

More information

Weighted Finite-State Transducers in Computational Biology

Weighted Finite-State Transducers in Computational Biology Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial

More information

Domain-invariant I-vector Feature Extraction for PLDA Speaker Verification

Domain-invariant I-vector Feature Extraction for PLDA Speaker Verification Odyssey 2018 The Speaker and Language Recognition Workshop 26-29 June 2018, Les Sables d Olonne, France Domain-invariant I-vector Feature Extraction for PLDA Speaker Verification Md Hafizur Rahman 1, Ivan

More information

Automatic Regularization of Cross-entropy Cost for Speaker Recognition Fusion

Automatic Regularization of Cross-entropy Cost for Speaker Recognition Fusion INTERSPEECH 203 Automatic Regularization of Cross-entropy Cost for Speaker Recognition Fusion Ville Hautamäki, Kong Aik Lee 2, David van Leeuwen 3, Rahim Saeidi 3, Anthony Larcher 2, Tomi Kinnunen, Taufiq

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 21: Speaker Adaptation Instructor: Preethi Jyothi Oct 23, 2017 Speaker variations Major cause of variability in speech is the differences between speakers Speaking

More information

A latent variable modelling approach to the acoustic-to-articulatory mapping problem

A latent variable modelling approach to the acoustic-to-articulatory mapping problem A latent variable modelling approach to the acoustic-to-articulatory mapping problem Miguel Á. Carreira-Perpiñán and Steve Renals Dept. of Computer Science, University of Sheffield {miguel,sjr}@dcs.shef.ac.uk

More information

EFFECTIVE ACOUSTIC MODELING FOR ROBUST SPEAKER RECOGNITION. Taufiq Hasan Al Banna

EFFECTIVE ACOUSTIC MODELING FOR ROBUST SPEAKER RECOGNITION. Taufiq Hasan Al Banna EFFECTIVE ACOUSTIC MODELING FOR ROBUST SPEAKER RECOGNITION by Taufiq Hasan Al Banna APPROVED BY SUPERVISORY COMMITTEE: Dr. John H. L. Hansen, Chair Dr. Carlos Busso Dr. Hlaing Minn Dr. P. K. Rajasekaran

More information

Sparse Models for Speech Recognition

Sparse Models for Speech Recognition Sparse Models for Speech Recognition Weibin Zhang and Pascale Fung Human Language Technology Center Hong Kong University of Science and Technology Outline Introduction to speech recognition Motivations

More information

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm + September13, 2016 Professor Meteer CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm Thanks to Dan Jurafsky for these slides + ASR components n Feature

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

Diagonal Priors for Full Covariance Speech Recognition

Diagonal Priors for Full Covariance Speech Recognition Diagonal Priors for Full Covariance Speech Recognition Peter Bell 1, Simon King 2 Centre for Speech Technology Research, University of Edinburgh Informatics Forum, 10 Crichton St, Edinburgh, EH8 9AB, UK

More information

MAP adaptation with SphinxTrain

MAP adaptation with SphinxTrain MAP adaptation with SphinxTrain David Huggins-Daines dhuggins@cs.cmu.edu Language Technologies Institute Carnegie Mellon University MAP adaptation with SphinxTrain p.1/12 Theory of MAP adaptation Standard

More information

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION HIDDEN MARKOV MODELS IN SPEECH RECOGNITION Wayne Ward Carnegie Mellon University Pittsburgh, PA 1 Acknowledgements Much of this talk is derived from the paper "An Introduction to Hidden Markov Models",

More information

Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics

Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Phil Woodland: pcw@eng.cam.ac.uk Lent 2013 Engineering Part IIB: Module 4F11 What is Speech Recognition?

More information

Noise Compensation for Subspace Gaussian Mixture Models

Noise Compensation for Subspace Gaussian Mixture Models Noise ompensation for ubspace Gaussian Mixture Models Liang Lu University of Edinburgh Joint work with KK hin, A. Ghoshal and. enals Liang Lu, Interspeech, eptember, 2012 Outline Motivation ubspace GMM

More information

Full-covariance model compensation for

Full-covariance model compensation for compensation transms Presentation Toshiba, 12 Mar 2008 Outline compensation transms compensation transms Outline compensation transms compensation transms Noise model x clean speech; n additive ; h convolutional

More information

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University Heeyoul (Henry) Choi Dept. of Computer Science Texas A&M University hchoi@cs.tamu.edu Introduction Speaker Adaptation Eigenvoice Comparison with others MAP, MLLR, EMAP, RMP, CAT, RSW Experiments Future

More information

Studies on Model Distance Normalization Approach in Text-independent Speaker Verification

Studies on Model Distance Normalization Approach in Text-independent Speaker Verification Vol. 35, No. 5 ACTA AUTOMATICA SINICA May, 009 Studies on Model Distance Normalization Approach in Text-independent Speaker Verification DONG Yuan LU Liang ZHAO Xian-Yu ZHAO Jian Abstract Model distance

More information

How to Deal with Multiple-Targets in Speaker Identification Systems?

How to Deal with Multiple-Targets in Speaker Identification Systems? How to Deal with Multiple-Targets in Speaker Identification Systems? Yaniv Zigel and Moshe Wasserblat ICE Systems Ltd., Audio Analysis Group, P.O.B. 690 Ra anana 4307, Israel yanivz@nice.com Abstract In

More information

Robust Speaker Identification

Robust Speaker Identification Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }

More information

Hidden Markov Models Part 2: Algorithms

Hidden Markov Models Part 2: Algorithms Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:

More information

Kernel Methods for Text-Independent Speaker Verification

Kernel Methods for Text-Independent Speaker Verification Kernel Methods for Text-Independent Speaker Verification Chris Longworth Cambridge University Engineering Department and Christ s College February 25, 2010 Dissertation submitted to the University of Cambridge

More information

The speaker partitioning problem

The speaker partitioning problem Odyssey 2010 The Speaker and Language Recognition Workshop 28 June 1 July 2010, Brno, Czech Republic The speaker partitioning problem Niko Brümmer and Edward de Villiers AGNITIO, South Africa, {nbrummer

More information

Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data

Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data Jan Vaněk, Lukáš Machlica, Josef V. Psutka, Josef Psutka University of West Bohemia in Pilsen, Univerzitní

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework

More information

A Generative Model Based Kernel for SVM Classification in Multimedia Applications

A Generative Model Based Kernel for SVM Classification in Multimedia Applications Appears in Neural Information Processing Systems, Vancouver, Canada, 2003. A Generative Model Based Kernel for SVM Classification in Multimedia Applications Pedro J. Moreno Purdy P. Ho Hewlett-Packard

More information

ONE-VECTOR REPRESENTATIONS OF STOCHASTIC SIGNALS FOR PATTERN RECOGNITION HAO TANG DISSERTATION

ONE-VECTOR REPRESENTATIONS OF STOCHASTIC SIGNALS FOR PATTERN RECOGNITION HAO TANG DISSERTATION c 2010 Hao Tang ONE-VECTOR REPRESENTATIONS OF STOCHASTIC SIGNALS FOR PATTERN RECOGNITION BY HAO TANG DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy

More information

Eigenvoice Modeling With Sparse Training Data

Eigenvoice Modeling With Sparse Training Data IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 3, MAY 2005 345 Eigenvoice Modeling With Sparse Training Data Patrick Kenny, Member, IEEE, Gilles Boulianne, Member, IEEE, and Pierre Dumouchel,

More information

Machine Learning Overview

Machine Learning Overview Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression

More information

Modeling acoustic correlations by factor analysis

Modeling acoustic correlations by factor analysis Modeling acoustic correlations by factor analysis Lawrence Saul and Mazin Rahim {lsaul.mazin}~research.att.com AT&T Labs - Research 180 Park Ave, D-130 Florham Park, NJ 07932 Abstract Hidden Markov models

More information

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes Mixtures of Gaussians with Sparse Regression Matrices Constantinos Boulis, Jeffrey Bilmes {boulis,bilmes}@ee.washington.edu Dept of EE, University of Washington Seattle WA, 98195-2500 UW Electrical Engineering

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

TNO SRE-2008: Calibration over all trials and side-information

TNO SRE-2008: Calibration over all trials and side-information Image from Dr Seuss TNO SRE-2008: Calibration over all trials and side-information David van Leeuwen (TNO, ICSI) Howard Lei (ICSI), Nir Krause (PRS), Albert Strasheim (SUN) Niko Brümmer (SDV) Knowledge

More information

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden

More information

University of Birmingham Research Archive

University of Birmingham Research Archive University of Birmingham Research Archive e-theses repository This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 9, SEPTEMBER

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 9, SEPTEMBER IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 9, SEPTEMBER 2013 1791 Joint Uncertainty Decoding for Noise Robust Subspace Gaussian Mixture Models Liang Lu, Student Member, IEEE,

More information

CS 136 Lecture 5 Acoustic modeling Phoneme modeling

CS 136 Lecture 5 Acoustic modeling Phoneme modeling + September 9, 2016 Professor Meteer CS 136 Lecture 5 Acoustic modeling Phoneme modeling Thanks to Dan Jurafsky for these slides + Directly Modeling Continuous Observations n Gaussians n Univariate Gaussians

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Mixtures of Gaussians with Sparse Structure

Mixtures of Gaussians with Sparse Structure Mixtures of Gaussians with Sparse Structure Costas Boulis 1 Abstract When fitting a mixture of Gaussians to training data there are usually two choices for the type of Gaussians used. Either diagonal or

More information

Unsupervised Discriminative Training of PLDA for Domain Adaptation in Speaker Verification

Unsupervised Discriminative Training of PLDA for Domain Adaptation in Speaker Verification INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Unsupervised Discriminative Training of PLDA for Domain Adaptation in Speaker Verification Qiongqiong Wang, Takafumi Koshinaka Data Science Research

More information

IBM Research Report. Training Universal Background Models for Speaker Recognition

IBM Research Report. Training Universal Background Models for Speaker Recognition RC24953 (W1003-002) March 1, 2010 Other IBM Research Report Training Universal Bacground Models for Speaer Recognition Mohamed Kamal Omar, Jason Pelecanos IBM Research Division Thomas J. Watson Research

More information

Spoken Language Understanding in a Latent Topic-based Subspace

Spoken Language Understanding in a Latent Topic-based Subspace INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Spoken Language Understanding in a Latent Topic-based Subspace Mohamed Morchid 1, Mohamed Bouaziz 1,3, Waad Ben Kheder 1, Killian Janod 1,2, Pierre-Michel

More information

Kent Academic Repository

Kent Academic Repository Kent Academic Repository Full text document (pdf) Citation for published version Song, Yan and Cui, Ruilian and McLoughlin, Ian Vince and Dai, Li-Rong (2016) Improvements on Deep Bottleneck Network based

More information