Around the Speaker De-Identification (Speaker diarization for de-identification ++) Itshak Lapidot Moez Ajili Jean-Francois Bonastre

Similar documents
STA 414/2104: Machine Learning

Joint Factor Analysis for Speaker Verification

STA 4273H: Statistical Machine Learning

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

Hidden Markov Models and Gaussian Mixture Models

speaker recognition using gmm-ubm semester project presentation

Fast speaker diarization based on binary keys. Xavier Anguera and Jean François Bonastre

A Small Footprint i-vector Extractor

Statistical Pattern Recognition

Brief Introduction of Machine Learning Techniques for Content Analysis

Hidden Markov Models and Gaussian Mixture Models

Linear Dynamical Systems

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Support Vector Machines using GMM Supervectors for Speaker Verification

p(d θ ) l(θ ) 1.2 x x x

Dynamic Approaches: The Hidden Markov Model

Hidden Markov Models in Language Processing

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Stephen Scott.

Expectation Maximization

Pattern Recognition and Machine Learning

CSCI-567: Machine Learning (Spring 2019)

How to Deal with Multiple-Targets in Speaker Identification Systems?

Machine Learning for Data Science (CS4786) Lecture 12

Statistical Machine Learning from Data

Clustering, K-Means, EM Tutorial

Lecture 3: Pattern Classification

Linear Dynamical Systems (Kalman filter)

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Speaker Verification Using Accumulative Vectors with Support Vector Machines

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Weighted Finite-State Transducers in Computational Biology

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Outline of Today s Lecture

Lecture 5: GMM Acoustic Modeling and Feature Extraction

Hidden Markov Models

A Generative Model Based Kernel for SVM Classification in Multimedia Applications

Statistical Pattern Recognition

CSCE 471/871 Lecture 3: Markov Chains and

Machine Learning Lecture 5

ECE521 Lecture 19 HMM cont. Inference in HMM

Introduction to Machine Learning Midterm, Tues April 8

Mixtures of Gaussians. Sargur Srihari

Machine Learning for Signal Processing Bayes Classification

Statistical NLP: Hidden Markov Models. Updated 12/15

Front-End Factor Analysis For Speaker Verification

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Data-Intensive Computing with MapReduce

Statistical Methods for NLP

Expectation-Maximization (EM) algorithm

SCORE CALIBRATING FOR SPEAKER RECOGNITION BASED ON SUPPORT VECTOR MACHINES AND GAUSSIAN MIXTURE MODELS

Hidden Markov Models

i-vector and GMM-UBM Bie Fanhu CSLT, RIIT, THU

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

University of Birmingham Research Archive

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Machine Learning for Signal Processing Expectation Maximization Mixture Models. Bhiksha Raj 27 Oct /

A gentle introduction to Hidden Markov Models

Session Variability Compensation in Automatic Speaker Recognition

Introduction to Machine Learning CMU-10701

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Lecture 7 Sequence analysis. Hidden Markov Models

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

Hidden Markov Models. Terminology, Representation and Basic Problems

Augmented Statistical Models for Speech Recognition

Mixtures of Gaussians with Sparse Structure

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Machine Learning for Signal Processing Bayes Classification and Regression

Intelligent Systems Statistical Machine Learning

Temporal Modeling and Basic Speech Recognition

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Latent Variable Models and Expectation Maximization

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Hidden Markov Models. Terminology and Basic Algorithms

Hidden Markov Models

Today s Lecture: HMMs

Hidden Markov models

Hidden Markov Modelling

10-701/15-781, Machine Learning: Homework 4

Lecture 3: Machine learning, classification, and generative models

Note Set 5: Hidden Markov Models

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Robust Speaker Identification

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme

Intelligent Systems Statistical Machine Learning

Conditional Random Field

Automatic Speech Recognition (CS753)

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

ECE521 Lecture7. Logistic Regression

Gaussian Mixture Models

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Transcription:

Around the Speaker De-Identification (Speaker diarization for de-identification ++) Itshak Lapidot Moez Ajili Jean-Francois Bonastre

The 2 Parts HDM based diarization System The homogeneity measure 2

Outline 1: HDM based diarization System Viterbi-Based HMM Short Summary Does Fudge Factor Just a Trick? Hidden-Distortion-Models (HDM) Motivation General HDM Speaker Diarization with HDM Experiments and Results Conclusions 3

Viterbi-Based HMM Short Summary Motivation HMMs prove themselves in sequential data machine learning for data segmentation and clustering. It applied successfully in: Speaker Diarization Video Segmentation Bio-Signals (ECG, EEG, fmri) Coding and Segmentation DNA and Protein-Chains Modeling Seismic Signals 4

Viterbi-Based HMM Short Summary (cont.) HMM Parameters The HMM K states model M is be defined by: 1. π: Vector of probabilities to be at each state at time n=1 P(s n=1 =k). 2. A: Matrix of transition probabilities a qk =P(s n =q s n-1 =k). 3. b: Vector of probability density function at each state b k (x)=p(x s=k). M = { π, Ab, } 5

Viterbi-Based HMM Short Summary (cont.) Viterbi-based HMM problems 1. Given an observation sequence X=(x 1,,x N ), and the model M, it is necessary to find the optimal state sequence S=(s 1,,s N ): S*=argmax s p(x, S M ). 2. Estimation of the model s parameters M = {π, A,b} that maximize p(x, S* M ) : M *=argmax M p(x, S* M ). 6

Viterbi-Based HMM Short Summary(cont.) 1 st problem solution Given an observation sequence X=(x 1, x N ), and the model M, it is necessary to find the optimal state sequence S=(s 1,,s N ): S*=argmax S p (X,S M ). (, M ) π ( 1) ( 2) ( N ) P X S = b x a b x a b x = s s s s s s s s 1 1 21 2 N N 1 N N N 1 n n 1 n s ss s n= 2 n= 1 ( ) = π a b x n Transitions dependent - S * Emission dependent b(x t ) 7

Viterbi-Based HMM Short Summary 2 nd problem solution Estimation of the model s parameters M ={π, A, b} that maximize p(x, S* M ) : M *=argmax M p(x, S* M ). Solution using Viterbi statistics N (, M ) π ( n ) P X S a b x N = s1 ss n n 1 s n= 2 n= 1 (, ) ln ( M P X, S M ) C X S (cont.) Cost function = = N π s 1 ss n n 1 s n n= 2 n= 1 n N ( ) = ln + ln a + lnb x n 8

Viterbi-Based HMM Short Summary(cont.) 2 nd problem solution (Cont.) Objective function to maximize the transition probabilities N K K J( A) = ln a + λk 1 a ss n n 1 n= 2 k= 1 q= 1 a qk = N N qk k qk Emission probabilities trained, for example, via GMM Expectation Maximization (EM) using the data associated to the each state 9

Viterbi-Based HMM Short Summary PDF log-likelihood ratio 0.4 0.3 0.2 0.1 The problem 0-3 -2-1 0 1 2 3 α 5 0-5 (a) (b) Frequent changes (cont.) Rare changes In Viterbi, the decision: to which state to move depends on the ratio: ( ) ( ) ( a ) ( ) ( ) 11 a22 59 60 a a 1 60 ln = ln = ln 4.1 12 21 ( ) ( ) aqkbq x aqk bq x = arkbr x a rk br x -3-2 -1 0 1 2 3 α ( a ) ( ) ( ) 11 a22 799 800 a a 1 800 ln = ln = ln 6.7 12 21 10

Hidden-Distortion-Models (HDM) Motivation Unbalanced transition and emission probabilities Parameters scaling required. Not always the emission models are probabilistic: VQ for data transmition(euclidian distance) Binary data (Hamming distance) In this case: the transition can be non-probabilistic costs. A more general models should be defined. 14

Speaker Diarization with HDM Speaker Diarization Definition The goal is to separate the conversation into R clusters each cluster, hopefully, contains single speaker data. Additional clusters can be added for nonspeech, simultaneous speech, etc Number of speakers, R, can be known; otherwise has to be estimated. In our application R=2. 20

Speaker Diarization with HDM (cont.) General Blocks 21

Experiments and Results Experiments Setup Experiments on telephone conversations 3 states HMM non-speech/speaker 1/speaker 2. 5 iterations 20 tied states (200msec), and 1 iteration 10 tied states. 108 conversations - LDC. 2048 conversations NIST-05 (Dev. Set 500; Eval. Set 1548). Models: 1. HMM 2. HDM with different constraints State Models: 1. SOM 6x10 2. GMM 21 full covariance EM training 24

Experiments and Results (cont.) Speaker Diarization with SOM LDC 30 25 ~26.0% Improvement 20 15 10 5 No Cost Geometrical Mean (0.5) Powered Inverse Sum (1.0) Scaled Inverse Sum (1.0) Scaled Log- Likelihood (0.2) Baseline 0 25

Experiments and Results (cont.) Speaker Diarization with SOM (cont.) Full NIST-05 with LDC optimization 25 No Improvement 20 15 10 5 No Cost Geometrical Mean (0.5) Powered Inverse Sum (1.0) Scaled Inverse Sum (1.0) Scaled Log- Likelihood (0.2) Baseline 0 27

Experiments and Results (cont.) Speaker Diarization with SOM (cont.) Eval NIST-05 with NIST Dev optimization 25 ~1.8% Improvement 20 15 10 5 No Cost Geometrical Mean (50) Powered Inverse Sum (1.5) Scaled Inverse Sum (1.5) Scaled Log- Likelihood (0.8) Baseline 0 28

Experiments and Results (cont.) Speaker Diarization with GMM LDC ~10.4% Improvement 30 25 20 15 10 5 No Cost Geometrical Mean (0.15) Powered Inverse Sum (0.4) Scaled Inverse Sum (0.2) Scaled Log- Likelihood (0.3) Baseline 0 29

Conclusions 1. LDC HMM costs are extremely not-optimal: Fudge factor = 0.2, i.e. costs multiplied by factor of five. 2. Scaling is data dependent. Optimal parameters for LDC leads to poor performances on NIST-05. 3. NIST-05: almost no improvement as HMM costs are almost optimal. Fudge factor = 0.8, i.e. costs multiplied by factor of 1.25. 4. The cost and the state models should be chosen together with the hyper-parameter, and are task dependent. 5. Much more work should be done in order to have deeper understanding about the parameter relations in the HDM. 30

Outline 2: The homogeneity measure Motivation The homogeneity measure Experiments and Results Conclusions 31

Motivation Can we compare these two identities? Is the decision meaningful even if the answer is correct? Not enough common data for comparison 32

Motivation cont. Can we compare these two identities? Is the decision meaningful even if the answer is correct? Not enough data for comparison 33

Motivation cont. Can we compare these two identities? Is the decision meaningful even if the answer is wrong? Sufficient amount of common data for comparison 34

Motivation cont. What are the conditions for a good homogeneity measure? 1. We need sufficient amount of common data for comparison 2. The measure should relay on the data itself 3. The measure have to be uncorrelated with the score 4. The measure have to be correlated with system performance 35

The homogeneity measure Can we compare these two identities? Which decision is meaningful? 36

The homogeneity measure cont. GMM M {,, } m m m m 1 λ = ω µ Σ = Posterior probability of the m th Gaussian γ ( mx, ) = ω N µ M κ = 1 ( x;, Σ ) m m m ω N µ 38 ( x;, Σ ) κ κ κ

The homogeneity measure cont. Datasets of the two utterances: Definitions X A = X B = { x } 1,, xn A { x } 1,, xn B Gaussian mixture occupation: ( m) γ ( mx, ) Q { AB, } γq = xn X n Q γ ( m) = γ A( m) + γ B( m) Bit distribution: Bit entropy: B ( ) ( m) ( m) γ m m γ m m A p p = p = 1 p = ( ) ( ) γ γ ( m) ( m) H p = p log p p log p 0 H p 1 m m m m m m 39 B

The homogeneity measure cont. Normalized measure: 1. The measure is bounded: o Same utterances: X o Disjoint data: Measures 2. Dose not take into account the amount of the information in the data. { ( )} M ω ( ) Hˆ = E H p = = H p m m m 1 m m 0 Hˆ 1 = X ˆ = 1 A B H ( ) γ ( ) ˆ m γ m = 0 m = 0 H 0 A B 40

The homogeneity measure cont. Measures Non-normalized measure: 1. Double the data, the measure will also double, but without any new information. 2. It works. Alternative option: Hˆ = NHˆ N = # X + # X non A B o M is the number of Gaussian mixtures to reach Y% of the total occupation (N). Hˆ non = MHˆ 41

Experiments and results Baseline SV system: 1. Features: 19 Mel-cepstrum + 11 Features + Var. Norm. 2. 512 UBM-GMM. 3. 400 dimensional i-vectors. 4. PLDA scoring. Experiment protocol: 1. NIST 2008 det 1 short2-sort3 2. 39433 tests: 8290 targets, 31143 non-targets 42

Experiments and results cont. Evaluation of the measure: 1. For every trail the homogeneity measure is calculated. 2. All the measures are sorted. 3. The measures chunked into chunks of size 1500 with sliding window of size 1000. 4. For every chunk False Alarm (FA), False Reject (FR), and mincllr (log-likelihood-ratio cost) are calculated. o CLRR gives systems cost of loss and not hard limiter like EER. Lower the CLRR, the better system performances. 43

Experiments and results cont. Normalized Homogeneity Measure Non-Normalized Homogeneity Measure 44

Conclusions Correct results due to the wrong reasons is bad. In order to have a valid examination, the data to be compare, must be comparable. In order to have a valid examination, the data to be compare, should be sufficient. This why the normalized measure does not work. High homogeneity measure leads to a low mincllr, i.e., high system s performance. 45

46