Joint Factor Analysis for Speaker Verification

Similar documents
i-vector and GMM-UBM Bie Fanhu CSLT, RIIT, THU

Session Variability Compensation in Automatic Speaker Recognition

Uncertainty Modeling without Subspace Methods for Text-Dependent Speaker Recognition

speaker recognition using gmm-ubm semester project presentation

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

Support Vector Machines using GMM Supervectors for Speaker Verification

Around the Speaker De-Identification (Speaker diarization for de-identification ++) Itshak Lapidot Moez Ajili Jean-Francois Bonastre

Front-End Factor Analysis For Speaker Verification

ECE 661: Homework 10 Fall 2014

Linear Dynamical Systems

STA 4273H: Statistical Machine Learning

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

A Small Footprint i-vector Extractor

Statistical Pattern Recognition

Automatic Speech Recognition (CS753)

Latent Variable View of EM. Sargur Srihari

Pattern Recognition and Machine Learning

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Hidden Markov Models and Gaussian Mixture Models

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

STA 414/2104: Machine Learning

Independent Component Analysis and Unsupervised Learning

Speaker Verification Using Accumulative Vectors with Support Vector Machines

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Statistical Pattern Recognition

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Introduction to Graphical Models

Expectation Maximization

K-Means and Gaussian Mixture Models

Unsupervised Learning

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Maximum variance formulation

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Probabilistic Graphical Models

Factor Analysis and Kalman Filtering (11/2/04)

Bayesian Analysis of Speaker Diarization with Eigenvoice Priors

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Probabilistic Latent Semantic Analysis

Hidden Markov Models in Language Processing

Machine Learning Techniques for Computer Vision

CS281 Section 4: Factor Analysis and PCA

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

PCA and admixture models

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Speaker recognition by means of Deep Belief Networks

Note Set 5: Hidden Markov Models

Linear Dimensionality Reduction

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Lecture 7: Con3nuous Latent Variable Models

CSCI-567: Machine Learning (Spring 2019)

Hidden Markov Models and Gaussian Mixture Models

Robust Speaker Identification

Weighted Finite-State Transducers in Computational Biology

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

p(d θ ) l(θ ) 1.2 x x x

Hidden Markov Models

STA 414/2104: Machine Learning

The Expectation Maximization or EM algorithm

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Lecture 6: April 19, 2002

Eigenvoice Speaker Adaptation via Composite Kernel PCA

CSC411 Fall 2018 Homework 5

Probabilistic & Unsupervised Learning

STA 4273H: Statistical Machine Learning

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

Advanced Introduction to Machine Learning

L11: Pattern recognition principles

Introduction to Machine Learning Midterm, Tues April 8

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Estimating Covariance Using Factorial Hidden Markov Models

Mixtures of Gaussians. Sargur Srihari

Introduction to Probabilistic Graphical Models: Exercises

Latent Variable Models and Expectation Maximization

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Hidden Markov Bayesian Principal Component Analysis

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Machine Learning for Signal Processing Bayes Classification and Regression

Joint Optimization of Segmentation and Appearance Models

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Probabilistic Graphical Models

An Integration of Random Subspace Sampling and Fishervoice for Speaker Verification

Randomized Algorithms

Linear Dynamical Systems (Kalman filter)

Principal Component Analysis and Linear Discriminant Analysis

Latent Variable Models and Expectation Maximization

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

Latent Variable Models and EM Algorithm

Fast speaker diarization based on binary keys. Xavier Anguera and Jean François Bonastre

Mixtures of Gaussians with Sparse Structure

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Bayesian Networks BY: MOHAMAD ALSABBAGH

Expectation Maximization

Gaussian Mixture Models, Expectation Maximization

Transcription:

Joint Factor Analysis for Speaker Verification Mengke HU ASPITRG Group, ECE Department Drexel University mengke.hu@gmail.com October 12, 2012 1/37

Outline 1 Speaker Verification Baseline System Session Variation 2 Joint Factor Analysis Hidden Markov Model Factor Analysis Model Principal Components Analysis (PCA) Probabilistic PCA 3 JFA for Speaker Verification General Steps Hyperparameter estimation 2/37

Baseline System H0: Target Model + Input Speech Feature Extraction Pre-Processing Σ Λ H1: Background Model 3/37

Baseline System 1 Given a speech segment X, we test 2 hypotheses: H 0 : X is from claimed target speaker S (GMM) H 1 : X is not from speaker S, it is from the background (UBM). 2 Decision Rule Score = log p(x TargetModel) p(x UBM) H 0 > < H 1 Threshold Note: Score = log p(x TargetModel) log p(x UBM) 4/37

Baseline Experiment 1 Feature Extraction (MFCC) 2 Train the UBM model 3 Obtain Adapted GMM model for target speaker model 4 Test trials against 2 hypotheses 5 Scoring 6 DET(Detection Error Tradeoff) curve false accept VS. false reject Problem How to cancel the channel effect? 5/37

Outline 1 Speaker Verification Baseline System Session Variation 2 Joint Factor Analysis Hidden Markov Model Factor Analysis Model Principal Components Analysis (PCA) Probabilistic PCA 3 JFA for Speaker Verification General Steps Hyperparameter estimation 6/37

Session Variation Inter-Speaker Variation: Two utterances are from different speaker 7/37

Session Variation Inter-Speaker Variation: Two utterances are from different speaker Inter-Session Variation: Two utterances are from the same speaker Channel effects: Utterances are recorded from different channels Intra-Speaker Variation: Utterances varies with speaker s health or emotional state etc.. 8/37

Outline 1 Speaker Verification Baseline System Session Variation 2 Joint Factor Analysis Hidden Markov Model Factor Analysis Model Principal Components Analysis (PCA) Probabilistic PCA 3 JFA for Speaker Verification General Steps Hyperparameter estimation 9/37

Gaussian Mixture Model Review Recall GMM: M s p(s z s ) = N(s;µ s i,σs i )z s,i i=1 p(s) = M s p(z s )p(s z s ) = πi s N(s;µs i,σs i )z s,i z s M s p(z s ) = i=1 π z s,i i i=1 z s is a hidden variable indicate which Gaussian mixture component is active. Remark: {z s,i } i 1...Ms are independent 10/37

Hidden Markov Model Graphic Model z 1 z 2 z i 1 z i z i+1 s 1 s 2 s i 1 s i s i+1 P(z n z n 1,...,z 1 ) = P(z n z n 1 ) HMM is often used in speaker recognition. 11/37

Hidden Markov Model We have the following joint probability: ( N p(x,z θ) = p(z 1 π) n=2 where A is transition probability matrix and p(z n z n 1,A) = p(z 1 π) = K k=1 p(x n z n,φ) = ) N p(z n z n 1,A) K K m=1 A z n 1,jz n,k jk k=1 j=1 k, π k = 1 π z 1k k K p(x n φ k ) z nk k=1 p(x m z m,φ) 12/37

Outline 1 Speaker Verification Baseline System Session Variation 2 Joint Factor Analysis Hidden Markov Model Factor Analysis Model Principal Components Analysis (PCA) Probabilistic PCA 3 JFA for Speaker Verification General Steps Hyperparameter estimation 13/37

Supervector definition Given the GMM mean vector (m c ) F 1, c {1,...,C}, C is the total number of mixture components, F is the dimension of feature vector Supervector is: m CF 1 = (m T 1,...,mT c ) 14/37

Speaker and Channel Dependent Supervector M h M h CHANNEL SPACE S C SPEAKER SPACE M h is the speaker-and channel-dependent supervector 15/37

Notations S: speaker ID Speaker factors: components of y(s) Channel factors: components of x h (s) Speaker space: affine translating the range of vv by m Channel space: the range of uu Loading matrix for speaker factors and channel factors: v and u h = 1,,H(s): one index from set of recordings for a speaker s C: total number of mixture components for a fixed GMM structure F: dimension of the acoustic feature vectors R C : channel rank R S : speaker rank Σ(s): given speaker s and recording h, the covariance of the observation from GMM d: given speaker s, the covariance of the observation from GMM 16/37

Joint Factor Analysis Model JFA model M(s) = m+vy(s) +dz(s) M h (s) = M(s)+ux h (s) m C F : Given a HMM/GMM structure with C mixture components, we concatenate the mean vectors m 1,...,m C together then obtain m M(s): single speaker-dependent supervector M h (s): speaker-and-channel dependent u and v are speaker independent d is a block diagonal matrix z is normal 17/37

JFA model M h CHANNEL SPACE M(s) ux h (s) SPEAKER SPACE 18/37

Problem Purpose: estimate the hyperparameters Λ = (m,u,v,d,σ). The number of GMM component is large. C = 2048 The dimension of the feature vector is F = 39 C F = 79872 = m 79872 and Σ 79872 79872. Problem Σ is very large and it is not full rank, how to estimate? 19/37

Outline 1 Speaker Verification Baseline System Session Variation 2 Joint Factor Analysis Hidden Markov Model Factor Analysis Model Principal Components Analysis (PCA) Probabilistic PCA 3 JFA for Speaker Verification General Steps Hyperparameter estimation 20/37

Principle Components Analysis Technique x 2 x n u 1 x n PCA technique is to find a principal subspace (magenta line), s.t. the variance of the projected points ( x n ) are maximized. x 1 21/37

Maximum Variance Formulation Find the principle components for principle subspace Given feature vectors as observations {(x n ) N 1 }, n = 1,...,N, we want to find the principle subspace with M basis, M < N 22/37

Maximum Variance Formulation Find the principle components for principle subspace Given feature vectors as observations {(x n ) N 1 }, n = 1,...,N, we want to find the principle subspace with M basis, M < N Sample mean x and sample covariance S: x = 1 N S = 1 N N n=1 x n N (x n x)(x n x) T n=1 Let the M basis for the principle subspace be u 1,...,u M and u T i u = 1, i [M] P N M = [u 1,...,u M ] N M 23/37

Maximum Variance Formulation Optimization problem find the 1 st principle component By Lagrange methode: maximize: take derivative maxu T 1 Su 1 u T 1 u 1 = 1 u T 1 Su 1 +λ 1 (1 u T 1 u 1 ) u T 1 Su 1 +λ 1 (1 u T 1 u 1) u 1 = (S+S T )u 1 +λ 1 ( 2u 1 ) = 2Su 1 2λ 1 u 1 = 0 = Su 1 = λ 1 u 1 Solution: λ 1 is the largest eigenvalue of S, the correspond u 1 is the first principle component 24/37

Maximum Variance Formulation Find M principle components: find M largest eigenvalues, and their correspond eigenvectors u i, i [M], such that: u T i u i = 1, i [M] u i u j, i j Eigen-decomposition S, find the M largest eigenvalues, decreasing sorted. Then, find the u 1,u 2,...,u M. Remark: [u T 1,u T 2,...,u T M ]T S[u 1,u 2,...,u M ] = P T SP 25/37

Outline 1 Speaker Verification Baseline System Session Variation 2 Joint Factor Analysis Hidden Markov Model Factor Analysis Model Principal Components Analysis (PCA) Probabilistic PCA 3 JFA for Speaker Verification General Steps Hyperparameter estimation 26/37

Probabilistic Model PCA model D >> M x D 1 = W D M z M 1 +µ+ǫ x is D-dimension observation vector z is M-dimension hidden variable We are given the following probability distributions: p(z) = N(z 0,I) p(x z) = N(x Wz+µ,σ 2 I) 27/37

Probabilistic PCA p(x) is Gaussian p(x) = p(x z)p(z)dz = N(x µ,c), C = WW T +σ 2 I mean and variance of p(x) E[x] = E[Wz+µ+ǫ] = µ cov[x] = E[(Wz+ǫ)(Wz+ǫ) T ] = E[Wzz T W T ]+E[ǫǫ T ] = WW T +σ 2 I 28/37

Probabilistic PCA zn σ 2 µ W xn The graph shows for each observation x n is associate with a value of latent variable z n x n can be obtained by marginalization over z n. Using EM algorithm to estimate the parameters in PCA model (Train PCA model) 29/37

Outline 1 Speaker Verification Baseline System Session Variation 2 Joint Factor Analysis Hidden Markov Model Factor Analysis Model Principal Components Analysis (PCA) Probabilistic PCA 3 JFA for Speaker Verification General Steps Hyperparameter estimation 30/37

5 steps for JFA Speaker Verification System 1 Train the UBM model 2 Train JFA/PCA model: estimate speaker independent hyperparameters Λ = (m,u,v,d,σ) from a large database in which each speaker is recorded in multiple sessions 3 Adapt Λ from one speaker population to another 4 Enrolling a speaker: estimate the speaker-independent hyperparameters Λ(s) = (m(s), u(s), v(s), d(s), Σ(s)) 5 Test: Given test utterance χ and hypothesized speaker, where X are observations. log P Λ(s)(X) P Λ (X) 31/37

Outline 1 Speaker Verification Baseline System Session Variation 2 Joint Factor Analysis Hidden Markov Model Factor Analysis Model Principal Components Analysis (PCA) Probabilistic PCA 3 JFA for Speaker Verification General Steps Hyperparameter estimation 32/37

Train the JFA/PCA model Estimate Λ Training set: several speakers with multiple recordings for each speaker Use EM algorithms to estimate Λ Maximum Likelihood Approach (slow) Divergence minimization approach (faster, well initialized) Both algorithm are to fit entire collection of speakers in the training data Total likelihood s P Λ(X(s)), s ranges over the speakers in the training set. It increases from 1 iteration to the next. 33/37

Adapt from one speaker population to another Adaptation is necessary since data set is limit. For a given speaker, there are at most 2 recordings. Keep channel space related hyperparameters fixed (u and Σ h ), re-estimate only the speaker space hyperparameters (m,v,d). Remark: Assume channel space related hyperparameters are speaker independent 34/37

Enroll a target speaker Estimate Λ(s) Recall JFA model: M(s) = m+vy(s) +dz(s) M h (s) = M(s)+ux h (s) Calculate the posterior distribution M(s) Adjusting the Λ(s) to fit this posterior Adopt minimum divergence approach 35/37

Likelihood Function Hyperparameters Λ = (m,u,v,d,σ). P Λ (X(s)) = P Λ (X(s) X)N(X 0,I)dX where: X(s) (observable) is the collections of labeled frames for recording h ( ) T X(s) = X 1 (s),...,x H(s) (s) X(s) (unobservable) is the vector of hidden variables ( ) T X(s) = x 1 (s),...,x H(s),y(s),z(s) N(X 0,I) is the standard Gaussian kernel N(X 0,I) = N(x 1 0,I)...N(x H(s) 0,I)N(y 0,I)N(z 0,I) 36/37

Likelihood ratio Given speech data X uttered by speaker t Test H 0 = {t = s} against H 1 = {t s} 1 T log P Λ s (X) P Λ (X) 37/37