Uncertainty Modeling without Subspace Methods for Text-Dependent Speaker Recognition
|
|
- Alexander Griffith
- 6 years ago
- Views:
Transcription
1 Uncertainty Modeling without Subspace Methods for Text-Dependent Speaker Recognition Patrick Kenny, Themos Stafylakis, Md. Jahangir Alam and Marcel Kockmann Odyssey Speaker and Language Recognition Workshop Bilbao, Spain June, / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods
2 Uncertainty Modeling in Text-Dependent Speaker Recognition Large numbers of mixture components are surprisingly effective in text-dependent speaker recognition where utterances are typically of 1 or 2 seconds duration The number of times a mixture component is observed typically << 1 and it could be 0 (particularly at test time) so observations ought to be treated as being noisy in the statistical sense Some progress has been made in uncertainty modeling in text-independent speaker recognition with subspace methods (i-vectors, speaker factors) but these are of limited use in text-dependent speaker recognition We tackle the problem of uncertainty modeling without resorting to subspace methods 2 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods
3 RSR2015 Part III (Random Digits) Background set (97 speakers) used for JFA and backend training Results reported on development set Enrollment consists of 3 utterances of the 10 digits in random order Each test utterance consists of a random string of 5 digits Error rates are much higher than on Part I Counterintuitively, it is hard to beat a naive GMM/UBM benchmark using HMMs We focus on backend modeling with a standard 60-dimensional PLP front end 3 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods
4 JFA for Speaker Recognition with Digits Given a speaker and a collection of enrollment recordings, the recordings are modeled by supervectors of the form m + Ux r + Dz (1) Speakers are characterized by z-vectors (supervector sized); the x-vectors (low-dimensional) model channel effects To perform speaker recognition, for each digit d in a test utterance compare the vectors supervectors z e and z t where z e is extracted from the enrollment utterances z t is extracted from the test utterance z vectors may be digit-independent (global) or digit-dependent (local) 4 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods
5 The Joint Density Backend uses point estimates of z e and z t The Hidden Supervector Backend treats z e and z t as latent variables. Inference requires Baum-Welch statistics A joint prior distribution (under the same-speaker hypothesis) P(w) where w = (z e, z t ) Calculating the posterior of w given Baum-Welch statistics 5 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods
6 Joint Density Backend The joint distribution for target trials, P T (z e, z t ), is modeled by a Gaussian for each mixture component Insufficient data to train full covariance Gaussians and diagonal Gaussians obviously incorrect Semi-diagonal constraints (see paper) Gaussians estimated by arranging the background set into a collection of target trials For non-target trials, assume statistical independence, i.e. P N (z e, z t ) = P T (z e ) P T (z t ) Likelihood ratio for speaker verification: PT (z e, z t ) P N (z e, z t ) where the product ranges over the digits in the test utterance and mixture components in the UBM 6 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods
7 Hidden Supervector Backend For each mixture component treat z e, z t as a pair of hidden mean vectors which are correlated in the case of a target trial Use an i-vector extractor to do probability calculations (not to extract factors) The i-vector w is the pair z e, z t so its dimension is twice that of the acoustic feature vectors The i-vector model has full rank so we can take the total variability matrix to be the identity and shift the burden of modeling the correlation between z e and z t to the prior The prior cannot be standard normal so it needs to be estimated 7 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods
8 Posterior Calculations For an i-vector extractor with a non-standard prior, ( Cov(w, w) = P + ) 1 N c T c T c c ( w = Cov(w, w) Pµ + ) T c F c c where µ is the prior expectation and P the precision. (In the standard case, µ = 0 and P = I.) 8 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods
9 Minimum Divergence Estimation of the Prior We need to supply the mean µ and precision matrix P that specifies the prior distribution of i-vectors for same-speaker trials. Arrange the background set into a collection of target trials indexed by s = 1,..., S and let w(s) be the i-vector for trial s. µ = 1 w(s) S s P 1 = 1 w(s)w (s) µµ S s Minor modifications to make µ and P digit dependent or impose semi-diagonal constraints. 9 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods
10 For the different speaker hypothesis, treat z e and z t as being statistically independent. In other words, suppress the cross correlations in the covariance matrix P 1 that defines the prior under the same-speaker hypothesis. 10 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods
11 Likelihood Ratio Given data and a probability model with hidden variables, the evidence is the likelihood of the data calculated by integrating out the hidden variables For an i-vector model the integral can be evaluated in closed form (it is a Gaussian integral) and expressed in terms of the Baum-Welch statistics (see paper) To evaluate the likelihood ratio for a speaker verification trial, evaluate the evidence twice Using the prior for the same-speaker hypothesis Using the prior for the different speaker hypothesis 11 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods
12 Preparing the Baum-Welch Statistics For each speaker, we have a collection of (enrollment or test) recordings indexed by r For each mixture component c, zero and first order statistics denoted by N r c and F r c Remove the channel effects from each recording and pool over recordings N c = r F c = r N r c (F r c N r cu c x r ) x r is a point-estimate of the hidden variable x r in (1) One set of synthetic statistics per speaker (regardless of the number of recordings) 12 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods
13 Length Normalization of the Synthetic Statistics In the JFA model (1), z c is a hidden variable The posterior covariance and expectation C c and z c, are given by C c = (I + N c D cd c ) 1 z c = C c D cf c so that z c 2 = z c 2 + trace(c c ) For each speaker, we scale the synthetic first order statistics so that c zc 2 is the same for all speakers 13 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods
14 The dominant term in (2) is trace(c c ) An experiment in the Appendix A demonstrates its usefulness The posterior covariance matrix C c depends critically on the relevance factor 14 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods
15 128 Mixture Components, Global z-vectors norm.? EER (M/F) DCF (M/F) 1 GMM - 4.8%/8.0% 0.217/ JDB - 4.8%/7.6% 0.219/ HSB 4.5%/6.8% 0.201/ HSB 3.9%/6.1% 0.177/0.307 Table 1: Results on the development set obtained with 128 Gaussians. The systems are a GMM/UBM system, the Joint Density Backend (JDB) and the Hidden Supervector Backend (HSB) both with global z-vectors. Baum-Welch statistics normalization is indicated by norm. 15 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods
16 512 Components, Global z-vectors r EER (M/F) DCF (M/F) 1 GMM 2 4.7%/8.2% 0.195/ JDB 2 4.3%/6.1% 0.196/ HSB 1 3.3%/4.6% 0.148/0.234 Table 2: Results on the development set obtained with 512 Gaussians and global z-vectors. 16 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods
17 512 Components, Local z-vectors EER (M/F) DCF (M/F) JDB (component fusion) 3.9%/5.2% 0.184/0.259 HSB (component fusion) 3.6%/3.9% 0.152/0.197 HSB (forced alignment) 3.5%/4.0% 0.152/0.197 Table 3: Results on the development set obtained with 512 Gaussians, local z-vectors and digit-dependent backends 17 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods
18 Fusion of Local and Global EER (M/F) DCF (M/F) dev local 3.7%/3.8% 0.149/0.193 dev global 3.2%/4.5% 0.148/0.232 dev fusion 2.9%/3.6% 0.131/0.186 eval local 2.6%/4.5% 0.134/0.211 eval global 2.7%/4.7% 0.140/0.236 eval fusion 2.3%/4.0% 0.122/0.192 Table 4: Results on the development and evaluation sets obtained with local and global Hidden Supervector systems, 512 components. 18 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods
19 Conclusion Modeling uncertainty yields error rate reductions of up to 25% compared with the Joint Density Backend, consistently across all experiments on the RSR Part III task This can be achieved without resorting to subspace methods although the idea can be seen as applying the same idea as the I-Vector Backend (Interspeech 2015) at the level of individual mixture components Unlike the I-Vector backend, the Hidden Supervector Backend can be configured in a way which makes very modest computational demands With semi-diagonal constraints on the prior, the run-time linear algebra involves only diagonal matrices 19 / 19 P. Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods
A Small Footprint i-vector Extractor
A Small Footprint i-vector Extractor Patrick Kenny Odyssey Speaker and Language Recognition Workshop June 25, 2012 1 / 25 Patrick Kenny A Small Footprint i-vector Extractor Outline Introduction Review
More informationJoint Factor Analysis for Speaker Verification
Joint Factor Analysis for Speaker Verification Mengke HU ASPITRG Group, ECE Department Drexel University mengke.hu@gmail.com October 12, 2012 1/37 Outline 1 Speaker Verification Baseline System Session
More informationi-vector and GMM-UBM Bie Fanhu CSLT, RIIT, THU
i-vector and GMM-UBM Bie Fanhu CSLT, RIIT, THU 2013-11-18 Framework 1. GMM-UBM Feature is extracted by frame. Number of features are unfixed. Gaussian Mixtures are used to fit all the features. The mixtures
More informationSession Variability Compensation in Automatic Speaker Recognition
Session Variability Compensation in Automatic Speaker Recognition Javier González Domínguez VII Jornadas MAVIR Universidad Autónoma de Madrid November 2012 Outline 1. The Inter-session Variability Problem
More informationFront-End Factor Analysis For Speaker Verification
IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING Front-End Factor Analysis For Speaker Verification Najim Dehak, Patrick Kenny, Réda Dehak, Pierre Dumouchel, and Pierre Ouellet, Abstract This
More informationModified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition System
INERSPEECH 2015 Modified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition System QingYang Hong 1, Lin Li 1, Ming Li 2, Ling Huang 1, Lihong Wan 1, Jun Zhang 1
More informationSpeaker recognition by means of Deep Belief Networks
Speaker recognition by means of Deep Belief Networks Vasileios Vasilakakis, Sandro Cumani, Pietro Laface, Politecnico di Torino, Italy {first.lastname}@polito.it 1. Abstract Most state of the art speaker
More informationSpeaker Verification Using Accumulative Vectors with Support Vector Machines
Speaker Verification Using Accumulative Vectors with Support Vector Machines Manuel Aguado Martínez, Gabriel Hernández-Sierra, and José Ramón Calvo de Lara Advanced Technologies Application Center, Havana,
More informationAn Integration of Random Subspace Sampling and Fishervoice for Speaker Verification
Odyssey 2014: The Speaker and Language Recognition Workshop 16-19 June 2014, Joensuu, Finland An Integration of Random Subspace Sampling and Fishervoice for Speaker Verification Jinghua Zhong 1, Weiwu
More informationUsually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However,
Odyssey 2012 The Speaker and Language Recognition Workshop 25-28 June 2012, Singapore First attempt of Boltzmann Machines for Speaker Verification Mohammed Senoussaoui 1,2, Najim Dehak 3, Patrick Kenny
More informationGain Compensation for Fast I-Vector Extraction over Short Duration
INTERSPEECH 27 August 2 24, 27, Stockholm, Sweden Gain Compensation for Fast I-Vector Extraction over Short Duration Kong Aik Lee and Haizhou Li 2 Institute for Infocomm Research I 2 R), A STAR, Singapore
More informationSupport Vector Machines using GMM Supervectors for Speaker Verification
1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:
More informationNovel Quality Metric for Duration Variability Compensation in Speaker Verification using i-vectors
Published in Ninth International Conference on Advances in Pattern Recognition (ICAPR-2017), Bangalore, India Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-vectors
More informationAn I-Vector Backend for Speaker Verification
An I-Vetor Bakend for Speaker Verifiation Patrik Kenny, 1 Themos Stafylakis, 1 Jahangir Alam, 1 and Marel Kokmann 2 1 CRIM, Canada, {patrik.kenny, themos.stafylakis, jahangir.alam}@rim.a 2 VoieTrust, Canada,
More informationReformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features
Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models
Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationStatistical NLP Spring The Noisy Channel Model
Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationMulticlass Discriminative Training of i-vector Language Recognition
Odyssey 214: The Speaker and Language Recognition Workshop 16-19 June 214, Joensuu, Finland Multiclass Discriminative Training of i-vector Language Recognition Alan McCree Human Language Technology Center
More informationUniversity of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I
University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()
More informationMinimax i-vector extractor for short duration speaker verification
Minimax i-vector extractor for short duration speaker verification Ville Hautamäki 1,2, You-Chi Cheng 2, Padmanabhan Rajan 1, Chin-Hui Lee 2 1 School of Computing, University of Eastern Finl, Finl 2 ECE,
More informationImproving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data
Distribution A: Public Release Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data Bengt J. Borgström Elliot Singer Douglas Reynolds and Omid Sadjadi 2
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationUnifying Probabilistic Linear Discriminant Analysis Variants in Biometric Authentication
Unifying Probabilistic Linear Discriminant Analysis Variants in Biometric Authentication Aleksandr Sizov 1, Kong Aik Lee, Tomi Kinnunen 1 1 School of Computing, University of Eastern Finland, Finland Institute
More informationA Generative Model for Score Normalization in Speaker Recognition
INTERSPEECH 017 August 0 4, 017, Stockholm, Sweden A Generative Model for Score Normalization in Speaker Recognition Albert Swart and Niko Brümmer Nuance Communications, Inc. (South Africa) albert.swart@nuance.com,
More informationBayesian Analysis of Speaker Diarization with Eigenvoice Priors
Bayesian Analysis of Speaker Diarization with Eigenvoice Priors Patrick Kenny Centre de recherche informatique de Montréal Patrick.Kenny@crim.ca A year in the lab can save you a day in the library. Panu
More informationTowards Duration Invariance of i-vector-based Adaptive Score Normalization
Odyssey 204: The Speaker and Language Recognition Workshop 6-9 June 204, Joensuu, Finland Towards Duration Invariance of i-vector-based Adaptive Score Normalization Andreas Nautsch*, Christian Rathgeb,
More informationSPEECH recognition systems based on hidden Markov
IEEE SIGNAL PROCESSING LETTERS, VOL. X, NO. X, 2014 1 Probabilistic Linear Discriminant Analysis for Acoustic Modelling Liang Lu, Member, IEEE and Steve Renals, Fellow, IEEE Abstract In this letter, we
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationHidden Markov Modelling
Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More information] Automatic Speech Recognition (CS753)
] Automatic Speech Recognition (CS753) Lecture 17: Discriminative Training for HMMs Instructor: Preethi Jyothi Sep 28, 2017 Discriminative Training Recall: MLE for HMMs Maximum likelihood estimation (MLE)
More informationISCA Archive
ISCA Archive http://www.isca-speech.org/archive ODYSSEY04 - The Speaker and Language Recognition Workshop Toledo, Spain May 3 - June 3, 2004 Analysis of Multitarget Detection for Speaker and Language Recognition*
More informationGaussian Models
Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationMonaural speech separation using source-adapted models
Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications
More informationSpeech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II)
Speech and Language Processing Chapter 9 of SLP Automatic Speech Recognition (II) Outline for ASR ASR Architecture The Noisy Channel Model Five easy pieces of an ASR system 1) Language Model 2) Lexicon/Pronunciation
More informationSCORE CALIBRATING FOR SPEAKER RECOGNITION BASED ON SUPPORT VECTOR MACHINES AND GAUSSIAN MIXTURE MODELS
SCORE CALIBRATING FOR SPEAKER RECOGNITION BASED ON SUPPORT VECTOR MACHINES AND GAUSSIAN MIXTURE MODELS Marcel Katz, Martin Schafföner, Sven E. Krüger, Andreas Wendemuth IESK-Cognitive Systems University
More informationspeaker recognition using gmm-ubm semester project presentation
speaker recognition using gmm-ubm semester project presentation OBJECTIVES OF THE PROJECT study the GMM-UBM speaker recognition system implement this system with matlab document the code and how it interfaces
More informationStatistical NLP Spring Digitizing Speech
Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon
More informationDigitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...
Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield
More informationLecture 3: ASR: HMMs, Forward, Viterbi
Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models
Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions
More informationA TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme
A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY MengSun,HugoVanhamme Department of Electrical Engineering-ESAT, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Bus
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More informationUsing Deep Belief Networks for Vector-Based Speaker Recognition
INTERSPEECH 2014 Using Deep Belief Networks for Vector-Based Speaker Recognition W. M. Campbell MIT Lincoln Laboratory, Lexington, MA, USA wcampbell@ll.mit.edu Abstract Deep belief networks (DBNs) have
More informationUnsupervised Methods for Speaker Diarization. Stephen Shum
Unsupervised Methods for Speaker Diarization by Stephen Shum B.S., University of California, Berkeley (2009) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment
More informationLecture 5: GMM Acoustic Modeling and Feature Extraction
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic
More informationINTERSPEECH 2016 Tutorial: Machine Learning for Speaker Recognition
INTERSPEECH 2016 Tutorial: Machine Learning for Speaker Recognition Man-Wai Mak and Jen-Tzung Chien The Hong Kong Polytechnic University, Hong Kong National Chiao Tung University, Taiwan September 8, 2016
More informationLow-dimensional speech representation based on Factor Analysis and its applications!
Low-dimensional speech representation based on Factor Analysis and its applications! Najim Dehak and Stephen Shum! Spoken Language System Group! MIT Computer Science and Artificial Intelligence Laboratory!
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationThe Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech
CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationAround the Speaker De-Identification (Speaker diarization for de-identification ++) Itshak Lapidot Moez Ajili Jean-Francois Bonastre
Around the Speaker De-Identification (Speaker diarization for de-identification ++) Itshak Lapidot Moez Ajili Jean-Francois Bonastre The 2 Parts HDM based diarization System The homogeneity measure 2 Outline
More informationEstimation of Relative Operating Characteristics of Text Independent Speaker Verification
International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,
More informationWeighted Finite-State Transducers in Computational Biology
Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial
More informationDomain-invariant I-vector Feature Extraction for PLDA Speaker Verification
Odyssey 2018 The Speaker and Language Recognition Workshop 26-29 June 2018, Les Sables d Olonne, France Domain-invariant I-vector Feature Extraction for PLDA Speaker Verification Md Hafizur Rahman 1, Ivan
More informationAutomatic Regularization of Cross-entropy Cost for Speaker Recognition Fusion
INTERSPEECH 203 Automatic Regularization of Cross-entropy Cost for Speaker Recognition Fusion Ville Hautamäki, Kong Aik Lee 2, David van Leeuwen 3, Rahim Saeidi 3, Anthony Larcher 2, Tomi Kinnunen, Taufiq
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 21: Speaker Adaptation Instructor: Preethi Jyothi Oct 23, 2017 Speaker variations Major cause of variability in speech is the differences between speakers Speaking
More informationA latent variable modelling approach to the acoustic-to-articulatory mapping problem
A latent variable modelling approach to the acoustic-to-articulatory mapping problem Miguel Á. Carreira-Perpiñán and Steve Renals Dept. of Computer Science, University of Sheffield {miguel,sjr}@dcs.shef.ac.uk
More informationEFFECTIVE ACOUSTIC MODELING FOR ROBUST SPEAKER RECOGNITION. Taufiq Hasan Al Banna
EFFECTIVE ACOUSTIC MODELING FOR ROBUST SPEAKER RECOGNITION by Taufiq Hasan Al Banna APPROVED BY SUPERVISORY COMMITTEE: Dr. John H. L. Hansen, Chair Dr. Carlos Busso Dr. Hlaing Minn Dr. P. K. Rajasekaran
More informationSparse Models for Speech Recognition
Sparse Models for Speech Recognition Weibin Zhang and Pascale Fung Human Language Technology Center Hong Kong University of Science and Technology Outline Introduction to speech recognition Motivations
More informationCS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm
+ September13, 2016 Professor Meteer CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm Thanks to Dan Jurafsky for these slides + ASR components n Feature
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationDiagonal Priors for Full Covariance Speech Recognition
Diagonal Priors for Full Covariance Speech Recognition Peter Bell 1, Simon King 2 Centre for Speech Technology Research, University of Edinburgh Informatics Forum, 10 Crichton St, Edinburgh, EH8 9AB, UK
More informationMAP adaptation with SphinxTrain
MAP adaptation with SphinxTrain David Huggins-Daines dhuggins@cs.cmu.edu Language Technologies Institute Carnegie Mellon University MAP adaptation with SphinxTrain p.1/12 Theory of MAP adaptation Standard
More informationHIDDEN MARKOV MODELS IN SPEECH RECOGNITION
HIDDEN MARKOV MODELS IN SPEECH RECOGNITION Wayne Ward Carnegie Mellon University Pittsburgh, PA 1 Acknowledgements Much of this talk is derived from the paper "An Introduction to Hidden Markov Models",
More informationEngineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics
Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Phil Woodland: pcw@eng.cam.ac.uk Lent 2013 Engineering Part IIB: Module 4F11 What is Speech Recognition?
More informationNoise Compensation for Subspace Gaussian Mixture Models
Noise ompensation for ubspace Gaussian Mixture Models Liang Lu University of Edinburgh Joint work with KK hin, A. Ghoshal and. enals Liang Lu, Interspeech, eptember, 2012 Outline Motivation ubspace GMM
More informationFull-covariance model compensation for
compensation transms Presentation Toshiba, 12 Mar 2008 Outline compensation transms compensation transms Outline compensation transms compensation transms Noise model x clean speech; n additive ; h convolutional
More informationHeeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University
Heeyoul (Henry) Choi Dept. of Computer Science Texas A&M University hchoi@cs.tamu.edu Introduction Speaker Adaptation Eigenvoice Comparison with others MAP, MLLR, EMAP, RMP, CAT, RSW Experiments Future
More informationStudies on Model Distance Normalization Approach in Text-independent Speaker Verification
Vol. 35, No. 5 ACTA AUTOMATICA SINICA May, 009 Studies on Model Distance Normalization Approach in Text-independent Speaker Verification DONG Yuan LU Liang ZHAO Xian-Yu ZHAO Jian Abstract Model distance
More informationHow to Deal with Multiple-Targets in Speaker Identification Systems?
How to Deal with Multiple-Targets in Speaker Identification Systems? Yaniv Zigel and Moshe Wasserblat ICE Systems Ltd., Audio Analysis Group, P.O.B. 690 Ra anana 4307, Israel yanivz@nice.com Abstract In
More informationRobust Speaker Identification
Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }
More informationHidden Markov Models Part 2: Algorithms
Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:
More informationKernel Methods for Text-Independent Speaker Verification
Kernel Methods for Text-Independent Speaker Verification Chris Longworth Cambridge University Engineering Department and Christ s College February 25, 2010 Dissertation submitted to the University of Cambridge
More informationThe speaker partitioning problem
Odyssey 2010 The Speaker and Language Recognition Workshop 28 June 1 July 2010, Brno, Czech Republic The speaker partitioning problem Niko Brümmer and Edward de Villiers AGNITIO, South Africa, {nbrummer
More informationCovariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data
Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data Jan Vaněk, Lukáš Machlica, Josef V. Psutka, Josef Psutka University of West Bohemia in Pilsen, Univerzitní
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework
More informationA Generative Model Based Kernel for SVM Classification in Multimedia Applications
Appears in Neural Information Processing Systems, Vancouver, Canada, 2003. A Generative Model Based Kernel for SVM Classification in Multimedia Applications Pedro J. Moreno Purdy P. Ho Hewlett-Packard
More informationONE-VECTOR REPRESENTATIONS OF STOCHASTIC SIGNALS FOR PATTERN RECOGNITION HAO TANG DISSERTATION
c 2010 Hao Tang ONE-VECTOR REPRESENTATIONS OF STOCHASTIC SIGNALS FOR PATTERN RECOGNITION BY HAO TANG DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy
More informationEigenvoice Modeling With Sparse Training Data
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 3, MAY 2005 345 Eigenvoice Modeling With Sparse Training Data Patrick Kenny, Member, IEEE, Gilles Boulianne, Member, IEEE, and Pierre Dumouchel,
More informationMachine Learning Overview
Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression
More informationModeling acoustic correlations by factor analysis
Modeling acoustic correlations by factor analysis Lawrence Saul and Mazin Rahim {lsaul.mazin}~research.att.com AT&T Labs - Research 180 Park Ave, D-130 Florham Park, NJ 07932 Abstract Hidden Markov models
More informationMixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes
Mixtures of Gaussians with Sparse Regression Matrices Constantinos Boulis, Jeffrey Bilmes {boulis,bilmes}@ee.washington.edu Dept of EE, University of Washington Seattle WA, 98195-2500 UW Electrical Engineering
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter
More informationTNO SRE-2008: Calibration over all trials and side-information
Image from Dr Seuss TNO SRE-2008: Calibration over all trials and side-information David van Leeuwen (TNO, ICSI) Howard Lei (ICSI), Nir Krause (PRS), Albert Strasheim (SUN) Niko Brümmer (SDV) Knowledge
More informationExperiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition
Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden
More informationUniversity of Birmingham Research Archive
University of Birmingham Research Archive e-theses repository This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul
More informationIEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 9, SEPTEMBER
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 9, SEPTEMBER 2013 1791 Joint Uncertainty Decoding for Noise Robust Subspace Gaussian Mixture Models Liang Lu, Student Member, IEEE,
More informationCS 136 Lecture 5 Acoustic modeling Phoneme modeling
+ September 9, 2016 Professor Meteer CS 136 Lecture 5 Acoustic modeling Phoneme modeling Thanks to Dan Jurafsky for these slides + Directly Modeling Continuous Observations n Gaussians n Univariate Gaussians
More informationClustering VS Classification
MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationMixtures of Gaussians with Sparse Structure
Mixtures of Gaussians with Sparse Structure Costas Boulis 1 Abstract When fitting a mixture of Gaussians to training data there are usually two choices for the type of Gaussians used. Either diagonal or
More informationUnsupervised Discriminative Training of PLDA for Domain Adaptation in Speaker Verification
INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Unsupervised Discriminative Training of PLDA for Domain Adaptation in Speaker Verification Qiongqiong Wang, Takafumi Koshinaka Data Science Research
More informationIBM Research Report. Training Universal Background Models for Speaker Recognition
RC24953 (W1003-002) March 1, 2010 Other IBM Research Report Training Universal Bacground Models for Speaer Recognition Mohamed Kamal Omar, Jason Pelecanos IBM Research Division Thomas J. Watson Research
More informationSpoken Language Understanding in a Latent Topic-based Subspace
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Spoken Language Understanding in a Latent Topic-based Subspace Mohamed Morchid 1, Mohamed Bouaziz 1,3, Waad Ben Kheder 1, Killian Janod 1,2, Pierre-Michel
More informationKent Academic Repository
Kent Academic Repository Full text document (pdf) Citation for published version Song, Yan and Cui, Ruilian and McLoughlin, Ian Vince and Dai, Li-Rong (2016) Improvements on Deep Bottleneck Network based
More information