Session Variability Compensation in Automatic Speaker Recognition
|
|
- Rodger Little
- 5 years ago
- Views:
Transcription
1 Session Variability Compensation in Automatic Speaker Recognition Javier González Domínguez VII Jornadas MAVIR Universidad Autónoma de Madrid November 2012
2 Outline 1. The Inter-session Variability Problem 2. From Eigenfaces to Joint Factor Analysis 3. Factor Analysis in Speaker and Language Recognition: I. Theory II. Where and How III. Efficiency 4. Total Variability 5. PLDA 6. Results (NIST SRE10, SRE 12) 2/31
3 Outline 1. The Inter-session Variability Problem 2. From Eigenfaces to Joint Factor Analysis 3. Factor Analysis in Speaker and Language Recognition: I. Theory II. Where and How III. Efficiency 4. Total Variability 5. PLDA 6. Results (NIST SRE10, SRE 12) 3/31
4 The Inter-Session Variability Problem: Causes Inter-session variability: All phenomena causing two recordings of a same identity to be different. v Transmission channel effects (GSM, landline,...). v Transducer characteristics (microphone type,...). v Environment Noise (traffic, people speaking,...) v Intra-speaker variability (age, illness, emotions,...) 4/31
5 Factor Analysis: Basis Principles: 1. Variability as a continuous source rather than discrete. 2. Modeling both session and inter-speaker/language variability. Assumption: 1. Variability lies in a lower-dimensional subspace. 5/31
6 Outline 1. The Inter-session Variability Problem 2. From Eigenfaces to Joint Factor Analysis 3. Factor Analysis in Speaker and Language Recognition: I. Theory II. Where and How III. Efficiency 4. Total Variability 5. PLDA 6. Results (NIST SRE10, SRE 12) 6/31
7 Eigenfaces, working scheme DEVELOPMENT STAGE Three first principal components Dev data: m samples Coding B PCA(C) DxM A Visualization DxK Train data: t samples TRAIN STAGE Reconstructed training images Coding M DxT A T M M KxT Visualization Test sample TEST STAGE s i! " accepted Coding t Dx1 A T M t Kx1 d(t,m) S Tx1 s i > " rejected 7/31
8 The GMM-UBM Framework: Maximum a Posteriori (MAP) cj cj x x x x x x x x x x x x x UBM ci Speaker Model A B ci 8/31
9 The GMM-UBM Framework: The Supervector Concept cj UBM ci DIMENSIONALITY! UBM = {" i,µ i,! i } n M: number of mixtures ( ) n F: feature dimension (20-60) n MF ( ~20k- 50k) 9/31
10 Eigenvoices & Eigenchannels GMM-UBM (MAP): sh = + Dz sh D: Full-rank diagonal (scaling factor) z: speaker component s speaker h utterance µ UBM supervector µ s speaker supervector µ sh target model supervector Eigenvoices: s = + Vy s V: speaker variability subspace (low-rank) y: corresponding weights for a given speaker, speaker factors Eigenchannels sh = s + Ux h U: session variability subspace (low-rank) x: corresponding weights for a given utterance/speaker, channel factors 10/31
11 Joint Factor Analysis: Eigenvoices + Eigenchannels + MAP Model = Speaker/language + Session [Kenny 04]. μ s s speaker h utterance µ UBM supervector µ s speaker supervector µ sh target model supervector V speaker variability subspace U session variability subspace x channel factors y speaker factors µ sh = µ + Vy s + Dz s + Ux sh 11/31
12 Outline 1. The Inter-session Variability Problem 2. From Eigenfaces to Joint Factor Analysis 3. Factor Analysis in Speaker and Language Recognition: I. Theory II. Where and How III. Efficiency 4. Total Variability 5. PLDA 6. A link with new machine learning paradigms 7. Results (NIST SRE10, SRE 12) 12/31
13 Factor Analysis: Graphical model! z n x n = observed variables (speaker supervectors). z n = latent variables (channel or speaker factors). µ L Hyperparameters (, L, ) = UBM supervector. L = U, V = UBM covariance x n N 13/31
14 Joint Factor Analysis: Point estimate of latent factors, x, y n A point essmate (mean of posterior) of x, y can be computed as in classic relevance MAP![z x] = " f! = (I + L T N" #1 $ #1 L) #1 L T $ #1 p(z) E[z x] p(z x) x 1 P(x z) ~ N (μ + Lz, Ψ) 0 Latent Variables Domain (D =1) Observations Domain (D = 2) x 2 14/31
15 Outline 1. The Inter-session Variability Problem 2. From Eigenfaces to Joint Factor Analysis 3. Factor Analysis in Speaker and Language Recognition: I. Theory II. Where and How III. Efficiency 4. Total Variability 5. PLDA 6. Results (NIST SRE10, SRE 12) 15/31
16 Factor Analysis: Where and How n Two classifiers (Acoustic systems) GMM SVM Maximum Margin Hyperplane Support Vectors Support Vectors Margin n Three different levels (domains) Feature domain Statistics domain Model domain (supervectors) 16/31
17 Outline 1. The Inter-session Variability Problem 2. From Eigenfaces to Joint Factor Analysis 3. Factor Analysis in Speaker and Language Recognition: I. Theory II. Where and How III. Efficiency 4. Total Variability 5. PLDA 6. Results (NIST SRE10, SRE 12) 17/31
18 Efficiency: ATVS system at NIST SRE2008 tel-tel Step System SVM JFA JFA- LS Development UBM training (2M feature vectors, gender dependent) 4h 4h 4h Training Variability Subspace U/V 1h 1h/1h 1h/1h Feature extrachon (per ~265s file) MFCC 2s 2s 2s Training (per ~265s file) GMM- train 8s 8s 8s FA point- essmate 0.1s 0.1s SVM- train 120s Total(train) 130.1s 10.1s 10.1s xrt train (CPU/speech) 0.50RT 0.04RT 0.04RT TesHng (per ~265s file) SV- train 8s FA point- essmate 0.1s 0.1s Scoring(frame by frame/ linear scoring) 3.2s 0.2s 1 x 10-4 s t- norm(100 models) 320s 20s 1x10-2 s Total(test) 331.2s 22.2s 2.02s xrt test (CPU/speech) 1.24RT 0.08RT 7.5x10-3 RT Training: 1min speech is processed in SVM: 30s JFA: 2.4s JFA-LS: 2.4s Testing: 1min speech is processed in SVM: 74.4s JFA: 4.8s JFA-LS: 0.45s 18/31
19 Outline 1. The Inter-session Variability Problem 2. From Eigenfaces to Joint Factor Analysis 3. Factor Analysis in Speaker and Language Recognition: I. Theory II. Where and How III. Efficiency 4. Total Variability 5. PLDA 6. Results (NIST SRE10, SRE 12) 19/31
20 Total Variability : m = m + Tw n Limited real data restrictions à U estimation might include speaker information. n Total Variability: T represents both session and target information m s = m + Tw sh n Disentangling phase in w domain (LDA, WCCN) 20/31
21 Outline 1. The Inter-session Variability Problem 2. From Eigenfaces to Joint Factor Analysis 3. Factor Analysis in Speaker and Language Recognition: I. Theory II. Where and How III. Efficiency 4. Total Variability 5. PLDA 6. Results (NIST SRE10, SRE 12) 21/31
22 PLDA: FA over i-vectors W ij i speaker j utterance w mean i-vector w ij target model i-vector F speaker variability subspace G session variability subspace k channel factors h speaker factors e noise term w ij = w + Fh i + Gk ij + e ij 22/31
23 PLDA: FA over i-vectors I speaker H utterance w i-vector F speaker variability subspace G session variability subspace k channel factors h speaker factors e noise term h k w H I Θ = {μ,f,g,σ} 23/31
24 Outline 1. The Inter-session Variability Problem 2. From Eigenfaces to Joint Factor Analysis 3. Factor Analysis in Speaker and Language Recognition: I. Theory II. Where and How III. Efficiency 4. Total Variability 5. PLDA 6. Results (NIST SRE10, SRE 12) 24/31
25 Experimental Results: SRE 10 n Organizer n n National Institute of Standards and Technology (NIST) Competitive participants MIT-LL, SRI, IBM Relevant data about task 2 Channel types involved (telephone, microphone) 2 Speech style involved (conversational, interview) Different vocal effort (high, neutral, low) ~150s train/test ~ 900 speakers ~ 800 models ~ 900 Test Files > Trials 25/31
26 Experimental Results: SRE 10, some samples n Telephone data n Microphone data Close mic Far mic n Vocal Effort Low Vocal Effort High Vocal Effort 26/31
27 Experimental Results: SRE 10 CondiHon EER_male EER_female EER_all C01_ext int vs. int, matched mic 0,54 0,96 0,84 C02_ext int vs. int, mismatched mic 0,47 1,27 1,11 C03_ext int vs. tel, mic vs. phn 1,88 3,54 2,71 C04_ext int vs. tel, mic vs. mic 1,37 3,69 2,52 C05_ext tel vs. tel, phn vs. phn 2,02 2,36 2,22 C06_ext tel vs. tel, normal vs. high vel 2,9 3,54 3,37 C07_ext mic vs. mic, normal vs. high vel 4,03 4,54 4,27 C08_ext tel vs. tel, normal vs. low vel 1,34 1,86 1,69 C09_ext mic vs. mic, normal vs. low vel 4,37 3,56 4,58 27/31
28 Experimental Results: SRE 12 n Drastic changes from past sre evaluations Multitraining (different number of files for model training) Test Variability Duration (20s to 160s) Noisy conditions n Large amount of test files under noisy conditions (10dbs, 0dbs SNR ) n Reverberation n An industrial task ~ 2K speakers ~ 1.8K models ~ 2.5K Test Files ~ 2M Trials (core); ~88M 28/31
29 Experimental Results: SRE 12, some samples n Noisy Files 10 dbs 0 dbs 29/31
30 Experimental Results: SRE 12, noise robustness 30/31
31 QUESTIONS 31/31
Joint Factor Analysis for Speaker Verification
Joint Factor Analysis for Speaker Verification Mengke HU ASPITRG Group, ECE Department Drexel University mengke.hu@gmail.com October 12, 2012 1/37 Outline 1 Speaker Verification Baseline System Session
More informationFront-End Factor Analysis For Speaker Verification
IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING Front-End Factor Analysis For Speaker Verification Najim Dehak, Patrick Kenny, Réda Dehak, Pierre Dumouchel, and Pierre Ouellet, Abstract This
More informationi-vector and GMM-UBM Bie Fanhu CSLT, RIIT, THU
i-vector and GMM-UBM Bie Fanhu CSLT, RIIT, THU 2013-11-18 Framework 1. GMM-UBM Feature is extracted by frame. Number of features are unfixed. Gaussian Mixtures are used to fit all the features. The mixtures
More informationSupport Vector Machines using GMM Supervectors for Speaker Verification
1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:
More informationUncertainty Modeling without Subspace Methods for Text-Dependent Speaker Recognition
Uncertainty Modeling without Subspace Methods for Text-Dependent Speaker Recognition Patrick Kenny, Themos Stafylakis, Md. Jahangir Alam and Marcel Kockmann Odyssey Speaker and Language Recognition Workshop
More informationA Small Footprint i-vector Extractor
A Small Footprint i-vector Extractor Patrick Kenny Odyssey Speaker and Language Recognition Workshop June 25, 2012 1 / 25 Patrick Kenny A Small Footprint i-vector Extractor Outline Introduction Review
More informationSpeaker Verification Using Accumulative Vectors with Support Vector Machines
Speaker Verification Using Accumulative Vectors with Support Vector Machines Manuel Aguado Martínez, Gabriel Hernández-Sierra, and José Ramón Calvo de Lara Advanced Technologies Application Center, Havana,
More informationTNO SRE-2008: Calibration over all trials and side-information
Image from Dr Seuss TNO SRE-2008: Calibration over all trials and side-information David van Leeuwen (TNO, ICSI) Howard Lei (ICSI), Nir Krause (PRS), Albert Strasheim (SUN) Niko Brümmer (SDV) Knowledge
More informationAn Integration of Random Subspace Sampling and Fishervoice for Speaker Verification
Odyssey 2014: The Speaker and Language Recognition Workshop 16-19 June 2014, Joensuu, Finland An Integration of Random Subspace Sampling and Fishervoice for Speaker Verification Jinghua Zhong 1, Weiwu
More informationLow-dimensional speech representation based on Factor Analysis and its applications!
Low-dimensional speech representation based on Factor Analysis and its applications! Najim Dehak and Stephen Shum! Spoken Language System Group! MIT Computer Science and Artificial Intelligence Laboratory!
More informationSpeaker recognition by means of Deep Belief Networks
Speaker recognition by means of Deep Belief Networks Vasileios Vasilakakis, Sandro Cumani, Pietro Laface, Politecnico di Torino, Italy {first.lastname}@polito.it 1. Abstract Most state of the art speaker
More informationModified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition System
INERSPEECH 2015 Modified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition System QingYang Hong 1, Lin Li 1, Ming Li 2, Ling Huang 1, Lihong Wan 1, Jun Zhang 1
More informationNovel Quality Metric for Duration Variability Compensation in Speaker Verification using i-vectors
Published in Ninth International Conference on Advances in Pattern Recognition (ICAPR-2017), Bangalore, India Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-vectors
More informationEFFECTIVE ACOUSTIC MODELING FOR ROBUST SPEAKER RECOGNITION. Taufiq Hasan Al Banna
EFFECTIVE ACOUSTIC MODELING FOR ROBUST SPEAKER RECOGNITION by Taufiq Hasan Al Banna APPROVED BY SUPERVISORY COMMITTEE: Dr. John H. L. Hansen, Chair Dr. Carlos Busso Dr. Hlaing Minn Dr. P. K. Rajasekaran
More informationINTERSPEECH 2016 Tutorial: Machine Learning for Speaker Recognition
INTERSPEECH 2016 Tutorial: Machine Learning for Speaker Recognition Man-Wai Mak and Jen-Tzung Chien The Hong Kong Polytechnic University, Hong Kong National Chiao Tung University, Taiwan September 8, 2016
More informationMulticlass Discriminative Training of i-vector Language Recognition
Odyssey 214: The Speaker and Language Recognition Workshop 16-19 June 214, Joensuu, Finland Multiclass Discriminative Training of i-vector Language Recognition Alan McCree Human Language Technology Center
More informationUsually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However,
Odyssey 2012 The Speaker and Language Recognition Workshop 25-28 June 2012, Singapore First attempt of Boltzmann Machines for Speaker Verification Mohammed Senoussaoui 1,2, Najim Dehak 3, Patrick Kenny
More informationIBM Research Report. Training Universal Background Models for Speaker Recognition
RC24953 (W1003-002) March 1, 2010 Other IBM Research Report Training Universal Bacground Models for Speaer Recognition Mohamed Kamal Omar, Jason Pelecanos IBM Research Division Thomas J. Watson Research
More informationSCORE CALIBRATING FOR SPEAKER RECOGNITION BASED ON SUPPORT VECTOR MACHINES AND GAUSSIAN MIXTURE MODELS
SCORE CALIBRATING FOR SPEAKER RECOGNITION BASED ON SUPPORT VECTOR MACHINES AND GAUSSIAN MIXTURE MODELS Marcel Katz, Martin Schafföner, Sven E. Krüger, Andreas Wendemuth IESK-Cognitive Systems University
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationUnifying Probabilistic Linear Discriminant Analysis Variants in Biometric Authentication
Unifying Probabilistic Linear Discriminant Analysis Variants in Biometric Authentication Aleksandr Sizov 1, Kong Aik Lee, Tomi Kinnunen 1 1 School of Computing, University of Eastern Finland, Finland Institute
More informationUniversity of Birmingham Research Archive
University of Birmingham Research Archive e-theses repository This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third
More informationUsing Deep Belief Networks for Vector-Based Speaker Recognition
INTERSPEECH 2014 Using Deep Belief Networks for Vector-Based Speaker Recognition W. M. Campbell MIT Lincoln Laboratory, Lexington, MA, USA wcampbell@ll.mit.edu Abstract Deep belief networks (DBNs) have
More informationHeeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University
Heeyoul (Henry) Choi Dept. of Computer Science Texas A&M University hchoi@cs.tamu.edu Introduction Speaker Adaptation Eigenvoice Comparison with others MAP, MLLR, EMAP, RMP, CAT, RSW Experiments Future
More informationUnsupervised Methods for Speaker Diarization. Stephen Shum
Unsupervised Methods for Speaker Diarization by Stephen Shum B.S., University of California, Berkeley (2009) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment
More informationSupport Vector Machines and Speaker Verification
1 Support Vector Machines and Speaker Verification David Cinciruk March 6, 2013 2 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft
More informationAround the Speaker De-Identification (Speaker diarization for de-identification ++) Itshak Lapidot Moez Ajili Jean-Francois Bonastre
Around the Speaker De-Identification (Speaker diarization for de-identification ++) Itshak Lapidot Moez Ajili Jean-Francois Bonastre The 2 Parts HDM based diarization System The homogeneity measure 2 Outline
More informationMinimax i-vector extractor for short duration speaker verification
Minimax i-vector extractor for short duration speaker verification Ville Hautamäki 1,2, You-Chi Cheng 2, Padmanabhan Rajan 1, Chin-Hui Lee 2 1 School of Computing, University of Eastern Finl, Finl 2 ECE,
More informationFast speaker diarization based on binary keys. Xavier Anguera and Jean François Bonastre
Fast speaker diarization based on binary keys Xavier Anguera and Jean François Bonastre Outline Introduction Speaker diarization Binary speaker modeling Binary speaker diarization system Experiments Conclusions
More informationHarmonic Structure Transform for Speaker Recognition
Harmonic Structure Transform for Speaker Recognition Kornel Laskowski & Qin Jin Carnegie Mellon University, Pittsburgh PA, USA KTH Speech Music & Hearing, Stockholm, Sweden 29 August, 2011 Laskowski &
More informationRobust Speaker Identification
Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }
More informationImproving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data
Distribution A: Public Release Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data Bengt J. Borgström Elliot Singer Douglas Reynolds and Omid Sadjadi 2
More informationNoise Compensation for Subspace Gaussian Mixture Models
Noise ompensation for ubspace Gaussian Mixture Models Liang Lu University of Edinburgh Joint work with KK hin, A. Ghoshal and. enals Liang Lu, Interspeech, eptember, 2012 Outline Motivation ubspace GMM
More informationMonaural speech separation using source-adapted models
Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 21: Speaker Adaptation Instructor: Preethi Jyothi Oct 23, 2017 Speaker variations Major cause of variability in speech is the differences between speakers Speaking
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationMixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes
Mixtures of Gaussians with Sparse Regression Matrices Constantinos Boulis, Jeffrey Bilmes {boulis,bilmes}@ee.washington.edu Dept of EE, University of Washington Seattle WA, 98195-2500 UW Electrical Engineering
More informationSpectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates
Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates Dima Ruinskiy Niv Dadush Yizhar Lavner Department of Computer Science, Tel-Hai College, Israel Outline Phoneme
More informationA Generative Model Based Kernel for SVM Classification in Multimedia Applications
Appears in Neural Information Processing Systems, Vancouver, Canada, 2003. A Generative Model Based Kernel for SVM Classification in Multimedia Applications Pedro J. Moreno Purdy P. Ho Hewlett-Packard
More informationISCA Archive
ISCA Archive http://www.isca-speech.org/archive ODYSSEY04 - The Speaker and Language Recognition Workshop Toledo, Spain May 3 - June 3, 2004 Analysis of Multitarget Detection for Speaker and Language Recognition*
More informationLecture 7: Con3nuous Latent Variable Models
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/
More informationEigenvoice Speaker Adaptation via Composite Kernel PCA
Eigenvoice Speaker Adaptation via Composite Kernel PCA James T. Kwok, Brian Mak and Simon Ho Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Hong Kong [jamesk,mak,csho]@cs.ust.hk
More informationStudies on Model Distance Normalization Approach in Text-independent Speaker Verification
Vol. 35, No. 5 ACTA AUTOMATICA SINICA May, 009 Studies on Model Distance Normalization Approach in Text-independent Speaker Verification DONG Yuan LU Liang ZHAO Xian-Yu ZHAO Jian Abstract Model distance
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationSpoken Language Understanding in a Latent Topic-based Subspace
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Spoken Language Understanding in a Latent Topic-based Subspace Mohamed Morchid 1, Mohamed Bouaziz 1,3, Waad Ben Kheder 1, Killian Janod 1,2, Pierre-Michel
More informationThe effect of speaking rate and vowel context on the perception of consonants. in babble noise
The effect of speaking rate and vowel context on the perception of consonants in babble noise Anirudh Raju Department of Electrical Engineering, University of California, Los Angeles, California, USA anirudh90@ucla.edu
More informationApproximate Bayesian Inference for Robust Speech Processing. A Thesis. Submitted to the Faculty. Drexel University. Ciira wa Maina
Approximate Bayesian Inference for Robust Speech Processing A Thesis Submitted to the Faculty of Drexel University by Ciira wa Maina in partial fulfillment of the requirements for the degree of Doctor
More informationEstimation of Relative Operating Characteristics of Text Independent Speaker Verification
International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,
More informationBayesian Analysis of Speaker Diarization with Eigenvoice Priors
Bayesian Analysis of Speaker Diarization with Eigenvoice Priors Patrick Kenny Centre de recherche informatique de Montréal Patrick.Kenny@crim.ca A year in the lab can save you a day in the library. Panu
More informationGain Compensation for Fast I-Vector Extraction over Short Duration
INTERSPEECH 27 August 2 24, 27, Stockholm, Sweden Gain Compensation for Fast I-Vector Extraction over Short Duration Kong Aik Lee and Haizhou Li 2 Institute for Infocomm Research I 2 R), A STAR, Singapore
More informationSpatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments
Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments Andreas Schwarz, Christian Huemmer, Roland Maas, Walter Kellermann Lehrstuhl für Multimediakommunikation
More informationGaussian Mixture Model Uncertainty Learning (GMMUL) Version 1.0 User Guide
Gaussian Mixture Model Uncertainty Learning (GMMUL) Version 1. User Guide Alexey Ozerov 1, Mathieu Lagrange and Emmanuel Vincent 1 1 INRIA, Centre de Rennes - Bretagne Atlantique Campus de Beaulieu, 3
More informationEnvironmental Sound Classification in Realistic Situations
Environmental Sound Classification in Realistic Situations K. Haddad, W. Song Brüel & Kjær Sound and Vibration Measurement A/S, Skodsborgvej 307, 2850 Nærum, Denmark. X. Valero La Salle, Universistat Ramon
More informationMaximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems
Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems Chin-Hung Sit 1, Man-Wai Mak 1, and Sun-Yuan Kung 2 1 Center for Multimedia Signal Processing Dept. of
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see
More informationDomain-invariant I-vector Feature Extraction for PLDA Speaker Verification
Odyssey 2018 The Speaker and Language Recognition Workshop 26-29 June 2018, Les Sables d Olonne, France Domain-invariant I-vector Feature Extraction for PLDA Speaker Verification Md Hafizur Rahman 1, Ivan
More informationAutomatic Regularization of Cross-entropy Cost for Speaker Recognition Fusion
INTERSPEECH 203 Automatic Regularization of Cross-entropy Cost for Speaker Recognition Fusion Ville Hautamäki, Kong Aik Lee 2, David van Leeuwen 3, Rahim Saeidi 3, Anthony Larcher 2, Tomi Kinnunen, Taufiq
More informationA latent variable modelling approach to the acoustic-to-articulatory mapping problem
A latent variable modelling approach to the acoustic-to-articulatory mapping problem Miguel Á. Carreira-Perpiñán and Steve Renals Dept. of Computer Science, University of Sheffield {miguel,sjr}@dcs.shef.ac.uk
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationEigenface-based facial recognition
Eigenface-based facial recognition Dimitri PISSARENKO December 1, 2002 1 General This document is based upon Turk and Pentland (1991b), Turk and Pentland (1991a) and Smith (2002). 2 How does it work? The
More informationFull-covariance model compensation for
compensation transms Presentation Toshiba, 12 Mar 2008 Outline compensation transms compensation transms Outline compensation transms compensation transms Noise model x clean speech; n additive ; h convolutional
More informationFuzzy Support Vector Machines for Automatic Infant Cry Recognition
Fuzzy Support Vector Machines for Automatic Infant Cry Recognition Sandra E. Barajas-Montiel and Carlos A. Reyes-García Instituto Nacional de Astrofisica Optica y Electronica, Luis Enrique Erro #1, Tonantzintla,
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationAugmented Statistical Models for Speech Recognition
Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent
More informationEngineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics
Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Phil Woodland: pcw@eng.cam.ac.uk Lent 2013 Engineering Part IIB: Module 4F11 What is Speech Recognition?
More informationPCA and LDA. Man-Wai MAK
PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer
More informationA Coupled Helmholtz Machine for PCA
A Coupled Helmholtz Machine for PCA Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 3 Hyoja-dong, Nam-gu Pohang 79-784, Korea seungjin@postech.ac.kr August
More informationSegmental Recurrent Neural Networks for End-to-end Speech Recognition
Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave
More informationFactor Analysis based Semantic Variability Compensation for Automatic Conversation Representation
Factor Analysis based Semantic Variability Compensation for Automatic Conversation Representation Mohamed Bouallegue, Mohamed Morchid, Richard Dufour, Driss Matrouf, Georges Linarès and Renato De Mori,
More informationSparse Models for Speech Recognition
Sparse Models for Speech Recognition Weibin Zhang and Pascale Fung Human Language Technology Center Hong Kong University of Science and Technology Outline Introduction to speech recognition Motivations
More informationMachine Recognition of Sounds in Mixtures
Machine Recognition of Sounds in Mixtures Outline 1 2 3 4 Computational Auditory Scene Analysis Speech Recognition as Source Formation Sound Fragment Decoding Results & Conclusions Dan Ellis
More informationA Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationModel-based unsupervised segmentation of birdcalls from field recordings
Model-based unsupervised segmentation of birdcalls from field recordings Anshul Thakur School of Computing and Electrical Engineering Indian Institute of Technology Mandi Himachal Pradesh, India Email:
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationReformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features
Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationL5 Support Vector Classification
L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander
More informationPattern Classification
Pattern Classification Introduction Parametric classifiers Semi-parametric classifiers Dimensionality reduction Significance testing 6345 Automatic Speech Recognition Semi-Parametric Classifiers 1 Semi-Parametric
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationKernel Methods for Text-Independent Speaker Verification
Kernel Methods for Text-Independent Speaker Verification Chris Longworth Cambridge University Engineering Department and Christ s College February 25, 2010 Dissertation submitted to the University of Cambridge
More informationAnalysis of mutual duration and noise effects in speaker recognition: benefits of condition-matched cohort selection in score normalization
Analysis of mutual duration and noise effects in speaker recognition: benefits of condition-matched cohort selection in score normalization Andreas Nautsch, Rahim Saeidi, Christian Rathgeb, and Christoph
More informationExemplar-based voice conversion using non-negative spectrogram deconvolution
Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore
More informationHMM part 1. Dr Philip Jackson
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. HMM part 1 Dr Philip Jackson Probability fundamentals Markov models State topology diagrams Hidden Markov models -
More informationSPEECH recognition systems based on hidden Markov
IEEE SIGNAL PROCESSING LETTERS, VOL. X, NO. X, 2014 1 Probabilistic Linear Discriminant Analysis for Acoustic Modelling Liang Lu, Member, IEEE and Steve Renals, Fellow, IEEE Abstract In this letter, we
More informationVariational Autoencoders
Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationModeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring
Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Kornel Laskowski & Qin Jin Carnegie Mellon University Pittsburgh PA, USA 28 June, 2010 Laskowski & Jin ODYSSEY 2010,
More informationNoise Classification based on PCA. Nattanun Thatphithakkul, Boontee Kruatrachue, Chai Wutiwiwatchai, Vataya Boonpiam
Noise Classification based on PCA Nattanun Thatphithakkul, Boontee Kruatrachue, Chai Wutiwiwatchai, Vataya Boonpiam 1 Outline Introduction Principle component analysis (PCA) Classification using PCA Experiment
More informationPCA and LDA. Man-Wai MAK
PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer
More informationRobust Sound Event Detection in Continuous Audio Environments
Robust Sound Event Detection in Continuous Audio Environments Haomin Zhang 1, Ian McLoughlin 2,1, Yan Song 1 1 National Engineering Laboratory of Speech and Language Information Processing The University
More informationDetection-Based Speech Recognition with Sparse Point Process Models
Detection-Based Speech Recognition with Sparse Point Process Models Aren Jansen Partha Niyogi Human Language Technology Center of Excellence Departments of Computer Science and Statistics ICASSP 2010 Dallas,
More informationSingle Channel Signal Separation Using MAP-based Subspace Decomposition
Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,
More informationNote Set 5: Hidden Markov Models
Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional
More informationSUBMITTED TO IEEE TRANSACTIONS ON SIGNAL PROCESSING 1. Correlation and Class Based Block Formation for Improved Structured Dictionary Learning
SUBMITTED TO IEEE TRANSACTIONS ON SIGNAL PROCESSING 1 Correlation and Class Based Block Formation for Improved Structured Dictionary Learning Nagendra Kumar and Rohit Sinha, Member, IEEE arxiv:178.1448v2
More informationFactor Analysis (10/2/13)
STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.
More informationMulti-task Learning with Gaussian Processes, with Applications to Robot Inverse Dynamics
1 / 38 Multi-task Learning with Gaussian Processes, with Applications to Robot Inverse Dynamics Chris Williams with Kian Ming A. Chai, Stefan Klanke, Sethu Vijayakumar December 2009 Motivation 2 / 38 Examples
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More information