Fast speaker diarization based on binary keys. Xavier Anguera and Jean François Bonastre
|
|
- Jasper Eaton
- 5 years ago
- Views:
Transcription
1 Fast speaker diarization based on binary keys Xavier Anguera and Jean François Bonastre
2 Outline Introduction Speaker diarization Binary speaker modeling Binary speaker diarization system Experiments Conclusions and future work
3 What is speaker diarization? (in case no one has told you already) Given a multi-speaker recording, identify who speaks when, setting each speaker with a generic ID. No information a priori is given regarding the number of speakers or their identity
4 Standard speaker diarization approaches
5 Standard Speaker Diarization system
6 State of the art Speaker diarization has reached very competitive accuracy levels 7-10% for Broadcast news (LIMSI RT04) 12-14% for Meetings (I2R RT09) but is currently too slow for many real-life applications standard: >> 1xRT ICSI (mono-core): 0.97xRT ICSI (GPU): 0.07xRT
7 What do we want? To dramatically speedup the processing while maintaining the accuracy level (DER) How do we do it? By adapting a recently proposed binary speaker modeling to diarization
8 Review of binary speaker modeling
9 Typical speaker modeling using GMM Training x[n] Acoustic param. EM-ML training GMM model λ Testing y[n] Acoustic param. Model evaluatio n Lkld(y[n] λ)
10 Problems of using GMM modeling for Diarization Lack of precision in modeling a particular speaker Very dependent on the model initialization or the UBM it is adapted from Statistical features usually model most occurring characteristics instead of speaker specific information Very slow when using iterative EM-ML and Viterbi
11 A new modeling paradigm Constraints Fast to compare two speaker models Should allow to model a speaker dynamically-> more than 1 vector per speaker file Noise robustness Should be possible to EXPLAIN a decision Solution Large space, to be discriminant between speakers But reduced quantification -> binary 11
12 Binary speaker modeling (I) Acoustic data Acoustic parameters extraction Binary key computation Binary Keys Background model (KBM) Binary key
13 Binary speaker modeling (II) General KBM components Selected KBM component (defines the in-interest subspace) In-interest area for the input data For a given input data, different sub-areas of the acoustic space are selected (each corresponding to one UBM component) 13
14 Binary speaker modeling (III) Selection of n best specificities Outputed values (bronwn data) Outputed values (green data) Selection of n best specificities 14
15 Obtaining the binary fingerprint
16 Similarity between binary vectors It is very fast to compute Any binary measure can be used, for example: S(v 1,v 2 ) = N i =1 N i =1 (v 1 [i] v 2 [i]) (v 1 [i] v 2 [i]) v v v 1 v v 1 v S(v 1,v 2 ) = 2 12 = 0.166
17 Preliminary speaker modeling experiments Initial experiments on a small database show that binary speaker models are quite discriminant for KBM > 512 Gauss
18 Binary speaker diarization system
19 Speaker diarization main blocks NOTE: we are still using the agglomerative clustering approach, but performed over binary keys
20 Acoustic + binary processing Acoustic modeling is only used in the initialization step. Thereafter everything is done in the binary space. We use standard acoustic features 19 MFCC (no Energy, no deltas) extracted every 10ms with 25ms window.
21 KBM model It is a special UBM trained from the test data No external data is used Its complexity is N>=512 Gaussians Performance does not usually improve above N=2000 Gaussians Standard Divisive (EM-ML) training approaches cannot be used as the Gaussian means are not representing particular speakers, but rather averages of all.
22 Building the KBM model
23 KBM training for Diarization We aim at training the KBM from the test data with no a priori knowledge on the speakers Select 1 st Gauss. Initialize v_kl2 argmax Lkld(x i,θ i ) Initialize v_kl2 v KL2 [i] =KL2(θ i,θ 1st ) i Gaussian Pool Iterate until N Gauss. Update KL2 distances Select Gauss with biggest KL2 dist. v KL 2 [i] =min(v KL 2 [i],kl2(θ',θ i ))
24 Efficient binarization For spkr. Diarization many binary keys will need to be computed with different sets of acoustic features. We split the process in 2 steps: 1. Compute the K-best KBM Gaussians for each acoustic feature vector <- only done once 2. For any subset of K-best binarized vectors compute the binary key as usual
25 MFCC features vectors KBM N Gauss Initially we have a set of acoustic features and the KBM model
26 MFCC features vectors 0 KBM N Gauss For each feature vector we obtain a binary vector with a 1 on the Gaussians with highest Lkld values. N-1
27 MFCC features vectors 0 KBM N Gauss N-1
28 MFCC features vectors KBM N Gauss 0 Such binarized vectors can be stored in memory in a compact way by just storing the positions of the most relevant Gaussians for each feature vector t disk N-1
29 MFCC features vectors To obtain a fingerprint for any segment we first accumulate the counts of all previously selected Gaussians KBM N Gauss t disk N-1
30 MFCC features vectors And finally, we get the binary key by turning to 1 the best cells N KBM N Gauss t disk N-1
31 Clustering initialization We need to define a set of N init initial clusters. We reuse the info in the KBM to do so: 5th 2nd 1st 6th 4th 3rd Viterbi/seg mental assignment Acoustic features KBM model
32 Agglomerative clustering Initial clusters Clusters training Segmental Assignment Obtain the fingerprint for frames associated to each cluster Clusters training Select best clustering Yes Closest pair merging Reached one cluster? No Compute the binary distance between all cluster pairs and merge the most similar
33 Segmental assignment We perform a fast assignment of segments to clusters based on signature similarities Binary Cluster models Binary comparison 1 sec. 1 sec. 1 sec. Clus. 3 Clus. 1 Clus. 3 Clus. 2
34 Best clustering selection From N init to 1 we select the optimum clustering using the student-t test T s metric inspired in [1] The intra and inter-cluster distances are used to obtain two comparing distributions. d 1 1 sec We select the clustering with biggest T s T s = µ µ σ 1 + σ 2 2 n 1 n 2 d2 d 1 d 1 d 2 D 1 : intra-cluster distances D 2 : inter-cluster distances Note that all segment-distances need to be pre-computed just once at the beginning [1] T-testdistance and clustering criterion for speaker diarization, Trung Hieu Nguyen, Eng Siong Chng and Haizhou Li, in Proc. Interspeech, 2008.
35 Some cluster selection examples #clusters = #speakers Optimum diarization result
36 Evaluation We used ALLNIST Rich Transcription Datasets We evaluate it using: Diarization error rate (DER): percentage of time where the wrong label is assigned, including overlap. Realtime factor (computed over the speech data) To compare we use a baseline acoustic-based system similar to [2] [2] A robust speaker clustering algorithm, Jitendra Ajmera and Chuck Wooters, in Proc. of IEEE ASRU, US Virgin Islands, USA, Dec
37 Results (I) Standard GMM-like training of the KBM Optimum results when stopping criterion is perfect
38 Results (II) DER as a function of # Gaussians in the KBM
39 Comparison results Meeting-by-meeting comparison between baseline (blue) and proposed system (red)
40 Conclusions and future work Progress in speaker diarization seems stagnant and doomed to long processing times We propose a very fast system by using a recently proposed binary speaker modeling technique We achieve DER scores that are close to GMM-based DER Next we are working on Improving the binary key fingerprint Finding a better stopping criterion Further speeding up the system
41 Thanks! Xavier Anguera
Around the Speaker De-Identification (Speaker diarization for de-identification ++) Itshak Lapidot Moez Ajili Jean-Francois Bonastre
Around the Speaker De-Identification (Speaker diarization for de-identification ++) Itshak Lapidot Moez Ajili Jean-Francois Bonastre The 2 Parts HDM based diarization System The homogeneity measure 2 Outline
More informationSpeaker Verification Using Accumulative Vectors with Support Vector Machines
Speaker Verification Using Accumulative Vectors with Support Vector Machines Manuel Aguado Martínez, Gabriel Hernández-Sierra, and José Ramón Calvo de Lara Advanced Technologies Application Center, Havana,
More informationJoint Factor Analysis for Speaker Verification
Joint Factor Analysis for Speaker Verification Mengke HU ASPITRG Group, ECE Department Drexel University mengke.hu@gmail.com October 12, 2012 1/37 Outline 1 Speaker Verification Baseline System Session
More informationExperiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition
Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden
More informationModel-based unsupervised segmentation of birdcalls from field recordings
Model-based unsupervised segmentation of birdcalls from field recordings Anshul Thakur School of Computing and Electrical Engineering Indian Institute of Technology Mandi Himachal Pradesh, India Email:
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 21: Speaker Adaptation Instructor: Preethi Jyothi Oct 23, 2017 Speaker variations Major cause of variability in speech is the differences between speakers Speaking
More informationReformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features
Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.
More informationApplication of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data
Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data Juan Torres 1, Ashraf Saad 2, Elliot Moore 1 1 School of Electrical and Computer
More informationSupport Vector Machines using GMM Supervectors for Speaker Verification
1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:
More informationA TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme
A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY MengSun,HugoVanhamme Department of Electrical Engineering-ESAT, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Bus
More informationSession Variability Compensation in Automatic Speaker Recognition
Session Variability Compensation in Automatic Speaker Recognition Javier González Domínguez VII Jornadas MAVIR Universidad Autónoma de Madrid November 2012 Outline 1. The Inter-session Variability Problem
More informationSpeaker recognition by means of Deep Belief Networks
Speaker recognition by means of Deep Belief Networks Vasileios Vasilakakis, Sandro Cumani, Pietro Laface, Politecnico di Torino, Italy {first.lastname}@polito.it 1. Abstract Most state of the art speaker
More informationspeaker recognition using gmm-ubm semester project presentation
speaker recognition using gmm-ubm semester project presentation OBJECTIVES OF THE PROJECT study the GMM-UBM speaker recognition system implement this system with matlab document the code and how it interfaces
More informationThe effect of speaking rate and vowel context on the perception of consonants. in babble noise
The effect of speaking rate and vowel context on the perception of consonants in babble noise Anirudh Raju Department of Electrical Engineering, University of California, Los Angeles, California, USA anirudh90@ucla.edu
More informationA Small Footprint i-vector Extractor
A Small Footprint i-vector Extractor Patrick Kenny Odyssey Speaker and Language Recognition Workshop June 25, 2012 1 / 25 Patrick Kenny A Small Footprint i-vector Extractor Outline Introduction Review
More informationCSE446: Clustering and EM Spring 2017
CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled
More informationAugmented Statistical Models for Speech Recognition
Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent
More informationExemplar-based voice conversion using non-negative spectrogram deconvolution
Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore
More informationMulticlass Discriminative Training of i-vector Language Recognition
Odyssey 214: The Speaker and Language Recognition Workshop 16-19 June 214, Joensuu, Finland Multiclass Discriminative Training of i-vector Language Recognition Alan McCree Human Language Technology Center
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationRobust Sound Event Detection in Continuous Audio Environments
Robust Sound Event Detection in Continuous Audio Environments Haomin Zhang 1, Ian McLoughlin 2,1, Yan Song 1 1 National Engineering Laboratory of Speech and Language Information Processing The University
More informationCovariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data
Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data Jan Vaněk, Lukáš Machlica, Josef V. Psutka, Josef Psutka University of West Bohemia in Pilsen, Univerzitní
More informationHarmonic Structure Transform for Speaker Recognition
Harmonic Structure Transform for Speaker Recognition Kornel Laskowski & Qin Jin Carnegie Mellon University, Pittsburgh PA, USA KTH Speech Music & Hearing, Stockholm, Sweden 29 August, 2011 Laskowski &
More informationHeeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University
Heeyoul (Henry) Choi Dept. of Computer Science Texas A&M University hchoi@cs.tamu.edu Introduction Speaker Adaptation Eigenvoice Comparison with others MAP, MLLR, EMAP, RMP, CAT, RSW Experiments Future
More informationSegmental Recurrent Neural Networks for End-to-end Speech Recognition
Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave
More informationMixtures of Gaussians with Sparse Structure
Mixtures of Gaussians with Sparse Structure Costas Boulis 1 Abstract When fitting a mixture of Gaussians to training data there are usually two choices for the type of Gaussians used. Either diagonal or
More informationMulti-level Gaussian selection for accurate low-resource ASR systems
Multi-level Gaussian selection for accurate low-resource ASR systems Leïla Zouari, Gérard Chollet GET-ENST/CNRS-LTCI 46 rue Barrault, 75634 Paris cedex 13, France Abstract For Automatic Speech Recognition
More informationSession 1: Pattern Recognition
Proc. Digital del Continguts Musicals Session 1: Pattern Recognition 1 2 3 4 5 Music Content Analysis Pattern Classification The Statistical Approach Distribution Models Singing Detection Dan Ellis
More informationUniversity of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I
University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationWhen Dictionary Learning Meets Classification
When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August
More informationBayesian Analysis of Speaker Diarization with Eigenvoice Priors
Bayesian Analysis of Speaker Diarization with Eigenvoice Priors Patrick Kenny Centre de recherche informatique de Montréal Patrick.Kenny@crim.ca A year in the lab can save you a day in the library. Panu
More informationOn the Influence of the Delta Coefficients in a HMM-based Speech Recognition System
On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System Fabrice Lefèvre, Claude Montacié and Marie-José Caraty Laboratoire d'informatique de Paris VI 4, place Jussieu 755 PARIS
More informationRobust Speaker Identification
Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationMaximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems
Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems Chin-Hung Sit 1, Man-Wai Mak 1, and Sun-Yuan Kung 2 1 Center for Multimedia Signal Processing Dept. of
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationCA-SVM: Communication-Avoiding Support Vector Machines on Distributed System
CA-SVM: Communication-Avoiding Support Vector Machines on Distributed System Yang You 1, James Demmel 1, Kent Czechowski 2, Le Song 2, Richard Vuduc 2 UC Berkeley 1, Georgia Tech 2 Yang You (Speaker) James
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationOutline of Today s Lecture
University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Jeff A. Bilmes Lecture 12 Slides Feb 23 rd, 2005 Outline of Today s
More informationSINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli
More informationECE 661: Homework 10 Fall 2014
ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short
More informationSinger Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers
Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Kumari Rambha Ranjan, Kartik Mahto, Dipti Kumari,S.S.Solanki Dept. of Electronics and Communication Birla
More informationMonaural speech separation using source-adapted models
Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications
More informationProc. of NCC 2010, Chennai, India
Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay
More informationComparison of Log-Linear Models and Weighted Dissimilarity Measures
Comparison of Log-Linear Models and Weighted Dissimilarity Measures Daniel Keysers 1, Roberto Paredes 2, Enrique Vidal 2, and Hermann Ney 1 1 Lehrstuhl für Informatik VI, Computer Science Department RWTH
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models
Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions
More informationTNO SRE-2008: Calibration over all trials and side-information
Image from Dr Seuss TNO SRE-2008: Calibration over all trials and side-information David van Leeuwen (TNO, ICSI) Howard Lei (ICSI), Nir Krause (PRS), Albert Strasheim (SUN) Niko Brümmer (SDV) Knowledge
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization
More informationCS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm
+ September13, 2016 Professor Meteer CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm Thanks to Dan Jurafsky for these slides + ASR components n Feature
More informationFront-End Factor Analysis For Speaker Verification
IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING Front-End Factor Analysis For Speaker Verification Najim Dehak, Patrick Kenny, Réda Dehak, Pierre Dumouchel, and Pierre Ouellet, Abstract This
More informationCSCE 471/871 Lecture 3: Markov Chains and
and and 1 / 26 sscott@cse.unl.edu 2 / 26 Outline and chains models (s) Formal definition Finding most probable state path (Viterbi algorithm) Forward and backward algorithms State sequence known State
More informationPattern Classification
Pattern Classification Introduction Parametric classifiers Semi-parametric classifiers Dimensionality reduction Significance testing 6345 Automatic Speech Recognition Semi-Parametric Classifiers 1 Semi-Parametric
More informationPivot Selection Techniques
Pivot Selection Techniques Proximity Searching in Metric Spaces by Benjamin Bustos, Gonzalo Navarro and Edgar Chávez Catarina Moreira Outline Introduction Pivots and Metric Spaces Pivots in Nearest Neighbor
More informationSubspace based/universal Background Model (UBM) based speech modeling This paper is available at
Subspace based/universal Background Model (UBM) based speech modeling This paper is available at http://dpovey.googlepages.com/jhu_lecture2.pdf Daniel Povey June, 2009 1 Overview Introduce the concept
More informationBoundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks
INTERSPEECH 2014 Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks Ryu Takeda, Naoyuki Kanda, and Nobuo Nukaga Central Research Laboratory, Hitachi Ltd., 1-280, Kokubunji-shi,
More informationEstimation of Relative Operating Characteristics of Text Independent Speaker Verification
International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,
More informationA New Unsupervised Event Detector for Non-Intrusive Load Monitoring
A New Unsupervised Event Detector for Non-Intrusive Load Monitoring GlobalSIP 2015, 14th Dec. Benjamin Wild, Karim Said Barsim, and Bin Yang Institute of Signal Processing and System Theory of,, Germany
More informationNoise Compensation for Subspace Gaussian Mixture Models
Noise ompensation for ubspace Gaussian Mixture Models Liang Lu University of Edinburgh Joint work with KK hin, A. Ghoshal and. enals Liang Lu, Interspeech, eptember, 2012 Outline Motivation ubspace GMM
More informationInformation Theoretic Imaging
Information Theoretic Imaging WU Faculty: J. A. O Sullivan WU Doctoral Student: Naveen Singla Boeing Engineer: James Meany First Year Focus: Imaging for Data Storage Image Reconstruction Data Retrieval
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationDynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji
Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient
More informationNecessary Corrections in Intransitive Likelihood-Ratio Classifiers
Necessary Corrections in Intransitive Likelihood-Ratio Classifiers Gang Ji and Jeff Bilmes SSLI-Lab, Department of Electrical Engineering University of Washington Seattle, WA 9895-500 {gang,bilmes}@ee.washington.edu
More informationSymmetric Distortion Measure for Speaker Recognition
ISCA Archive http://www.isca-speech.org/archive SPECOM 2004: 9 th Conference Speech and Computer St. Petersburg, Russia September 20-22, 2004 Symmetric Distortion Measure for Speaker Recognition Evgeny
More informationON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION
Zaragoza Del 8 al 1 de Noviembre de 26 ON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION Ana I. García Moral, Carmen Peláez Moreno EPS-Universidad Carlos III
More informationHIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU
April 4-7, 2016 Silicon Valley HIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU Minmin Sun, NVIDIA minmins@nvidia.com April 5th Brief Introduction of CTC AGENDA Alpha/Beta Matrix
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationHidden Markov Model and Speech Recognition
1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed
More informationHidden Markov Models. Dr. Naomi Harte
Hidden Markov Models Dr. Naomi Harte The Talk Hidden Markov Models What are they? Why are they useful? The maths part Probability calculations Training optimising parameters Viterbi unseen sequences Real
More informationAnomaly Detection for the CERN Large Hadron Collider injection magnets
Anomaly Detection for the CERN Large Hadron Collider injection magnets Armin Halilovic KU Leuven - Department of Computer Science In cooperation with CERN 2018-07-27 0 Outline 1 Context 2 Data 3 Preprocessing
More informationIBM Research Report. A Convex-Hull Approach to Sparse Representations for Exemplar-Based Speech Recognition
RC25152 (W1104-113) April 25, 2011 Computer Science IBM Research Report A Convex-Hull Approach to Sparse Representations for Exemplar-Based Speech Recognition Tara N Sainath, David Nahamoo, Dimitri Kanevsky,
More informationTinySR. Peter Schmidt-Nielsen. August 27, 2014
TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models
Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationStatistical NLP Spring The Noisy Channel Model
Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationA Generative Score Space for Statistical Dialog Characterization in Social Signalling
A Generative Score Space for Statistical Dialog Characterization in Social Signalling 1 S t-1 1 S t 1 S t+4 2 S t-1 2 S t 2 S t+4 Anna Pesarin, Paolo Calanca, Vittorio Murino, Marco Cristani Istituto Italiano
More informationBayesian Nonparametric Learning of Complex Dynamical Phenomena
Duke University Department of Statistical Science Bayesian Nonparametric Learning of Complex Dynamical Phenomena Emily Fox Joint work with Erik Sudderth (Brown University), Michael Jordan (UC Berkeley),
More informationLecture 3. Gaussian Mixture Models and Introduction to HMM s. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen, Markus Nussbaum-Thom
Lecture 3 Gaussian Mixture Models and Introduction to HMM s Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen, Markus Nussbaum-Thom Watson Group IBM T.J. Watson Research Center Yorktown Heights, New
More informationLecture 3: Machine learning, classification, and generative models
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Machine learning, classification, and generative models 1 Classification 2 Generative models 3 Gaussian models Michael Mandel
More informationClustering. Léon Bottou COS 424 3/4/2010. NEC Labs America
Clustering Léon Bottou NEC Labs America COS 424 3/4/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression, other.
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA. Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley
ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley University of Surrey Centre for Vision Speech and Signal Processing Guildford,
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 8: Tied state HMMs + DNNs in ASR Instructor: Preethi Jyothi Aug 17, 2017 Final Project Landscape Voice conversion using GANs Musical Note Extraction Keystroke
More informationi-vector and GMM-UBM Bie Fanhu CSLT, RIIT, THU
i-vector and GMM-UBM Bie Fanhu CSLT, RIIT, THU 2013-11-18 Framework 1. GMM-UBM Feature is extracted by frame. Number of features are unfixed. Gaussian Mixtures are used to fit all the features. The mixtures
More informationCISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)
CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models
More informationHow to Deal with Multiple-Targets in Speaker Identification Systems?
How to Deal with Multiple-Targets in Speaker Identification Systems? Yaniv Zigel and Moshe Wasserblat ICE Systems Ltd., Audio Analysis Group, P.O.B. 690 Ra anana 4307, Israel yanivz@nice.com Abstract In
More informationSupport Vector Machine. Industrial AI Lab.
Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different
More informationAllpass Modeling of LP Residual for Speaker Recognition
Allpass Modeling of LP Residual for Speaker Recognition K. Sri Rama Murty, Vivek Boominathan and Karthika Vijayan Department of Electrical Engineering, Indian Institute of Technology Hyderabad, India email:
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationPrinciples of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata
Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision
More informationGaussian Mixture Model Uncertainty Learning (GMMUL) Version 1.0 User Guide
Gaussian Mixture Model Uncertainty Learning (GMMUL) Version 1. User Guide Alexey Ozerov 1, Mathieu Lagrange and Emmanuel Vincent 1 1 INRIA, Centre de Rennes - Bretagne Atlantique Campus de Beaulieu, 3
More informationEngineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics
Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Phil Woodland: pcw@eng.cam.ac.uk Lent 2013 Engineering Part IIB: Module 4F11 What is Speech Recognition?
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationStatistical NLP Spring Digitizing Speech
Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon
More informationDigitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...
Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield
More informationCS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines
CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several
More informationThe Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech
CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given
More informationMachine Learning for Signal Processing Expectation Maximization Mixture Models. Bhiksha Raj 27 Oct /
Machine Learning for Signal rocessing Expectation Maximization Mixture Models Bhiksha Raj 27 Oct 2016 11755/18797 1 Learning Distributions for Data roblem: Given a collection of examples from some data,
More informationPattern Recognition Applied to Music Signals
JHU CLSP Summer School Pattern Recognition Applied to Music Signals 2 3 4 5 Music Content Analysis Classification and Features Statistical Pattern Recognition Gaussian Mixtures and Neural Nets Singing
More information