Noise Compensation for Subspace Gaussian Mixture Models
|
|
- Wilfrid Crawford
- 5 years ago
- Views:
Transcription
1 Noise ompensation for ubspace Gaussian Mixture Models Liang Lu University of Edinburgh Joint work with KK hin, A. Ghoshal and. enals Liang Lu, Interspeech, eptember, 2012
2 Outline Motivation ubspace GMM (GMM) works well in matched speech condition [Povey et al., 2011] In mismatched condition (i.e. noise), the gain disappears Goal Noise compensation for GMM Method Model space compensation Joint uncertainty decoding (JUD) [Liao and Gales, 2005] Liang Lu, Interspeech, eptember, 2012
3 HMM-GMM acoustic model j 1 j j +1 Liang Lu, Interspeech, eptember, 2012
4 ubspace Gaussian Mixture Models [Povey et al., 2011] wi Mi Σi i =1,...,I j 1 j j +1 v jk Global M i is the basis for means w i is the basis for weights Σ i is the covariance matrix tate-dependent v jk is low dimensional vectors (e.g. 40dim) Gaussian mean: µ jki = M i v jk Liang Lu, Interspeech, eptember, 2012
5 ubspace Gaussian Mixture Models More intuitively, suppose we have an acoustic space like this Liang Lu, Interspeech, eptember, 2012
6 ubspace Gaussian Mixture Models We then partition the whole acoustic space into I regions. his can be done by learning a GMM using the training data I 2 3 Liang Lu, Interspeech, eptember, 2012
7 ubspace Gaussian Mixture Models We then introduce some parameters to structure each region w i Σ i Mi 3 Σ i - model the covariance of this region M i - span the basis for Gaussian mean w i - span the basis for Gaussian weight Liang Lu, Interspeech, eptember, 2012
8 ubspace Gaussian Mixture Models Given a class with some data, such as an HMM state j 1 j j v jk 3 Liang Lu, Interspeech, eptember, 2012
9 ubspace Gaussian Mixture Models hen we learn a GMM for this class j 1 j j v jk 3 Liang Lu, Interspeech, eptember, 2012
10 Noise compensation Larger modelling power higher recognition accuracy. Our systems on Aurora 4, the #Gaussians is 6.4M (GMM), vs. 50k (GMM). GMM vs. GMM 5.2% vs. 7.7% on clean condition GMM vs. GMM 59.9% vs. 59.3% on noisy condition an we do noise compensation for GMMs? WE GMM clean GMM clean GMM noisy GMM noisy Liang Lu, Interspeech, eptember, 2012
11 Noise compensation here are numerous work on noise compensation for robust A [Deng, 2011] Feature domain pectral subtraction, cmn/cvn epstral mean square error estimation Algonquin plice Feature space vector aylor series (V) Model domain MLL, noise constraint MLL PM, Data-driven PM (DPM), iterative DPM V, joint uncertainty decoding (JUD) Linear spline interpolation (LI) Unscented transform (U) Hybrid Noise adaptive training Liang Lu, Interspeech, eptember, 2012
12 Noise compensation for GMM Model space compensation for GMM Not data-driven but using heuristic knowledge Mismatch function y = f (x, h, n, α) [Acero, 1990] α denotes the phase term between noise and speech [Deng et al., 2004]. lean speech x hanel noise h Noisy speech y Additive noise n Liang Lu, Interspeech, eptember, 2012
13 Noise compensation for GMM Model space compensation for GMM Not data-driven but using heuristic knowledge Mismatch function y = f (x, h, n, α) [Acero, 1990] α denotes the phase term between noise and speech [Deng et al., 2004]. lean speech x hanel noise h Noisy speech y Additive noise n Liang Lu, Interspeech, eptember, 2012
14 Noise compensation for GMM Model space compensation for GMM Not data-driven but using heuristic knowledge Mismatch function y = f (x, h, n, α) [Acero, 1990] α denotes the phase term between noise and speech [Deng et al., 2004]. lean speech x hanel noise h Noisy speech y Additive noise n Liang Lu, Interspeech, eptember, 2012
15 Noise compensation for GMM Model space compensation for GMM Not data-driven but using heuristic knowledge Mismatch function y = f (x, h, n, α) [Acero, 1990] α denotes the phase term between noise and speech [Deng et al., 2004]. lean speech x hanel noise h Noisy speech y Additive noise n Liang Lu, Interspeech, eptember, 2012
16 Noise compensation for GMM he mismatch function is y = f (x, h, n, α) = x + h + log [1 + exp ( 1 (n x h) ) + 2α exp ( 1 (n x h)/2 ) ]. (1) }{{} phase term where be the D matrix. Liang Lu, Interspeech, eptember, 2012
17 Noise compensation Aim: estimate µ y and Σ y for each Gaussian component. Difficulty: y = f (x, h, n, α) is highly nonlinear, no analytic solution! olution: Vector aylor series (V) approximation [Moreno et al., 1996] ost: eal time factor > 100, memory > 10G for (medium size) GMM with 6.4M Gaussian Inelegant: Direct apply V will destroy the compact of structure of GMMs Liang Lu, Interspeech, eptember, 2012
18 Noise compensation Aim: estimate µ y and Σ y for each Gaussian component. Difficulty: y = f (x, h, n, α) is highly nonlinear, no analytic solution! olution: Vector aylor series (V) approximation [Moreno et al., 1996] ost: eal time factor > 100, memory > 10G for (medium size) GMM with 6.4M Gaussian Inelegant: Direct apply V will destroy the compact of structure of GMMs Liang Lu, Interspeech, eptember, 2012
19 Noise compensation Aim: estimate µ y and Σ y for each Gaussian component. Difficulty: y = f (x, h, n, α) is highly nonlinear, no analytic solution! olution: Vector aylor series (V) approximation [Moreno et al., 1996] ost: eal time factor > 100, memory > 10G for (medium size) GMM with 6.4M Gaussian Inelegant: Direct apply V will destroy the compact of structure of GMMs Liang Lu, Interspeech, eptember, 2012
20 Noise compensation Aim: estimate µ y and Σ y for each Gaussian component. Difficulty: y = f (x, h, n, α) is highly nonlinear, no analytic solution! olution: Vector aylor series (V) approximation [Moreno et al., 1996] ost: eal time factor > 100, memory > 10G for (medium size) GMM with 6.4M Gaussian Inelegant: Direct apply V will destroy the compact of structure of GMMs Liang Lu, Interspeech, eptember, 2012
21 Noise compensation Aim: estimate µ y and Σ y for each Gaussian component. Difficulty: y = f (x, h, n, α) is highly nonlinear, no analytic solution! olution: Vector aylor series (V) approximation [Moreno et al., 1996] ost: eal time factor > 100, memory > 10G for (medium size) GMM with 6.4M Gaussian Inelegant: Direct apply V will destroy the compact of structure of GMMs Liang Lu, Interspeech, eptember, 2012
22 Noise compensation olution: Joint uncertainty decoding (JUD) V JUD V vs. JUD Liang Lu, Interspeech, eptember, 2012
23 Noise compensation Applying JUD to GMM 1... I 2 3 ost: eal time factor 10 for GMM with 6.4M Gaussians Liang Lu, Interspeech, eptember, 2012
24 Experiments Database Aurora 4 dataset lean speech and noisy speech with N [5db - 15db] lose-talking microphone and desk-mounted microphone 15 hour training data 330 testing utterances ystem configuration 39dim MF #triphone states: 3.1k (GMM) vs. 3.9k (GMM) #Gaussians: 50k (GMM) vs. 6.4M (GMM) #regression classes: 112 (GMM) vs. 400 (GMM) Liang Lu, Interspeech, eptember, 2012
25 Noise compensation experiments GMM GMM GMM GMM GMM 10 0 Baseline JUD V Liang Lu, Interspeech, eptember, 2012
26 Experiments esults by tuning the value of phase factors V/GMM system JUD/GMM system JUD/GMM system Word Error ate (\%) he value of phase factor JUD/GMM system achieved 16.8% WE on Aurora 4 database Liang Lu, Interspeech, eptember, 2012
27 emarks he phase term is very effective for noise compensation imilar improvements were also observed in other studies, e.g. [Li et al., 2009] he reasons maybe it can compensate for the linearization bias and performs domain compensation [Li et al., 2009] Our insight is it may helps to avoid the over estimation of the noise model Liang Lu, Interspeech, eptember, 2012
28 emarks he phase term is very effective for noise compensation imilar improvements were also observed in other studies, e.g. [Li et al., 2009] he reasons maybe it can compensate for the linearization bias and performs domain compensation [Li et al., 2009] Our insight is it may helps to avoid the over estimation of the noise model Liang Lu, Interspeech, eptember, 2012
29 emarks he phase term is very effective for noise compensation imilar improvements were also observed in other studies, e.g. [Li et al., 2009] he reasons maybe it can compensate for the linearization bias and performs domain compensation [Li et al., 2009] Our insight is it may helps to avoid the over estimation of the noise model Liang Lu, Interspeech, eptember, 2012
30 emarks he phase term is very effective for noise compensation imilar improvements were also observed in other studies, e.g. [Li et al., 2009] he reasons maybe it can compensate for the linearization bias and performs domain compensation [Li et al., 2009] Our insight is it may helps to avoid the over estimation of the noise model Liang Lu, Interspeech, eptember, 2012
31 onclusion GMM is a promising alternative for acoustic modelling Noise compensation using JUD works well for GMMs he phase term is particular effective for the noise compensation Future works will be on noise adaptive training, compensation in log-spectral domain. Liang Lu, Interspeech, eptember, 2012
32 Liang Lu, Interspeech, eptember, 2012
33 Noise compensation With JUD, the marginal likelihood can be obtained as ( ) p(y m) A (r) N A (r) y + b (r) ; µ m, Σ m + Σ (r) b. (2) he transformation is done in the feature space, applied to each frame omputation is saved since that the #frame #Gaussians he transformation should be diagonalized in GMM systems, but not in GMM system since we used full covariance matrix Liang Lu, Interspeech, eptember, 2012
34 Experiments able: GMM systems with α = 0. Methods lean Avg lean model M model V JUD able: GMM systems with α = 0. Methods lean Avg lean model M model JUD Liang Lu, Interspeech, eptember, 2012
35 Acero, A. (1990). Acoustic and Enviromental obustness in Automatic peech ecognition. PhD thesis, arnegie Mellon University. Deng, L., Droppo, J., and Acero, A. (2004). Enhancement of log mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise. IEEE ransactions on peech and Audio Processing, 12(2): Droppo, J., Acero, A., and Deng, L. (2002). Uncertainty decoding with PLIE for noise robust speech recognition. In Proc. IAP. IEEE. Gales, M. (1995). Model-based techniques for noise robust speech recognition. PhD thesis, ambridge University. Liang Lu, Interspeech, eptember, 2012
36 Hu, Y. and Huo, Q. (2006). An HMM compensation approach using unscented transformation for noisy speech recognition. hinese poken Language Processing, pages Li, J., Deng, L., Yu, D., Gong, Y., and Acero, A. (2009). A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions. omputer peech & Language, 23(3): Liao, H. and Gales, M. (2005). Joint uncertainty decoding for noise robust speech recognition. In Proc. INEPEEH. iteseer. Moreno, P., aj, B., and tern,. (1996). A vector aylor series approach for environment-independent speech recognition. In Proc. IAP, volume 2, pages IEEE. Liang Lu, Interspeech, eptember, 2012
37 Povey, D., Burget, L., Agarwal, M., Akyazi, P., Kai, F., Ghoshal, A., Glembek, O., Goel, N., Karafiát, M., astrow, A., ose,., chwarz, P., and homas,. (2011). he subspace Gaussian mixture model A structured model for speech recognition. omputer peech & Language, 25(2): Liang Lu, Interspeech, eptember, 2012
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 9, SEPTEMBER
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 9, SEPTEMBER 2013 1791 Joint Uncertainty Decoding for Noise Robust Subspace Gaussian Mixture Models Liang Lu, Student Member, IEEE,
More informationFull-covariance model compensation for
compensation transms Presentation Toshiba, 12 Mar 2008 Outline compensation transms compensation transms Outline compensation transms compensation transms Noise model x clean speech; n additive ; h convolutional
More informationDNN-based uncertainty estimation for weighted DNN-HMM ASR
DNN-based uncertainty estimation for weighted DNN-HMM ASR José Novoa, Josué Fredes, Nestor Becerra Yoma Speech Processing and Transmission Lab., Universidad de Chile nbecerra@ing.uchile.cl Abstract In
More informationMULTI-FRAME FACTORISATION FOR LONG-SPAN ACOUSTIC MODELLING. Liang Lu and Steve Renals
MULTI-FRAME FACTORISATION FOR LONG-SPAN ACOUSTIC MODELLING Liang Lu and Steve Renals Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK {liang.lu, s.renals}@ed.ac.uk ABSTRACT
More informationFeature-space Speaker Adaptation for Probabilistic Linear Discriminant Analysis Acoustic Models
Feature-space Speaker Adaptation for Probabilistic Linear Discriminant Analysis Acoustic Models Liang Lu, Steve Renals Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK {liang.lu,
More informationSPEECH recognition systems based on hidden Markov
IEEE SIGNAL PROCESSING LETTERS, VOL. X, NO. X, 2014 1 Probabilistic Linear Discriminant Analysis for Acoustic Modelling Liang Lu, Member, IEEE and Steve Renals, Fellow, IEEE Abstract In this letter, we
More informationALGONQUIN - Learning dynamic noise models from noisy speech for robust speech recognition
ALGONQUIN - Learning dynamic noise models from noisy speech for robust speech recognition Brendan J. Freyl, Trausti T. Kristjanssonl, Li Deng 2, Alex Acero 2 1 Probabilistic and Statistical Inference Group,
More informationModel-Based Approaches to Robust Speech Recognition
Model-Based Approaches to Robust Speech Recognition Mark Gales with Hank Liao, Rogier van Dalen, Chris Longworth (work partly funded by Toshiba Research Europe Ltd) 11 June 2008 King s College London Seminar
More informationSegmental Recurrent Neural Networks for End-to-end Speech Recognition
Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave
More informationUncertainty training and decoding methods of deep neural networks based on stochastic representation of enhanced features
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Uncertainty training and decoding methods of deep neural networks based on stochastic representation of enhanced features Tachioka, Y.; Watanabe,
More informationSubspace Gaussian Mixture Models for Automatic Speech Recognition. Liang Lu
Subspace Gaussian Mixture Models for Automatic Speech Recognition Liang Lu Doctor of Philosophy Institute for Language, Cognition and Computation School of Informatics University of Edinburgh 2013 Abstract
More informationLow development cost, high quality speech recognition for new languages and domains. Cheap ASR
Low development cost, high quality speech recognition for new languages and domains Cheap ASR Participants: Senior members : Lukas Burget, Nagendra Kumar Goel, Daniel Povey, Richard Rose Graduate students:
More informationImproving Reverberant VTS for Hands-free Robust Speech Recognition
Improving Reverberant VTS for Hands-free Robust Speech Recognition Y.-Q. Wang, M. J. F. Gales Cambridge University Engineering Department Trumpington St., Cambridge CB2 1PZ, U.K. {yw293, mjfg}@eng.cam.ac.uk
More informationVery Deep Convolutional Neural Networks for LVCSR
INTERSPEECH 2015 Very Deep Convolutional Neural Networks for LVCSR Mengxiao Bi, Yanmin Qian, Kai Yu Key Lab. of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering SpeechLab,
More informationarxiv: v1 [cs.lg] 4 Aug 2016
An improved uncertainty decoding scheme with weighted samples for DNN-HMM hybrid systems Christian Huemmer 1, Ramón Fernández Astudillo 2, and Walter Kellermann 1 1 Multimedia Communications and Signal
More informationFeature-Space Structural MAPLR with Regression Tree-based Multiple Transformation Matrices for DNN
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Feature-Space Structural MAPLR with Regression Tree-based Multiple Transformation Matrices for DNN Kanagawa, H.; Tachioka, Y.; Watanabe, S.;
More informationEstimation of Cepstral Coefficients for Robust Speech Recognition
Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationDiagonal Priors for Full Covariance Speech Recognition
Diagonal Priors for Full Covariance Speech Recognition Peter Bell 1, Simon King 2 Centre for Speech Technology Research, University of Edinburgh Informatics Forum, 10 Crichton St, Edinburgh, EH8 9AB, UK
More informationCEPSTRAL analysis has been widely used in signal processing
162 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 2, MARCH 1999 On Second-Order Statistics and Linear Estimation of Cepstral Coefficients Yariv Ephraim, Fellow, IEEE, and Mazin Rahim, Senior
More informationWhy DNN Works for Acoustic Modeling in Speech Recognition?
Why DNN Works for Acoustic Modeling in Speech Recognition? Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Joint work with Y. Bao, J. Pan,
More informationExperiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition
Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationModel-Based Margin Estimation for Hidden Markov Model Learning and Generalization
1 2 3 4 5 6 7 8 Model-Based Margin Estimation for Hidden Markov Model Learning and Generalization Sabato Marco Siniscalchi a,, Jinyu Li b, Chin-Hui Lee c a Faculty of Engineering and Architecture, Kore
More informationMULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES. Mingzi Li,Israel Cohen and Saman Mousazadeh
MULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES Mingzi Li,Israel Cohen and Saman Mousazadeh Department of Electrical Engineering, Technion - Israel
More informationGMM-based classification from noisy features
GMM-based classification from noisy features Alexey Ozerov, Mathieu Lagrange and Emmanuel Vincent INRIA, Centre de Rennes - Bretagne Atlantique STMS Lab IRCAM - CNRS - UPMC alexey.ozerov@inria.fr, mathieu.lagrange@ircam.fr,
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More information"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction"
"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction" Francesco Nesta, Marco Matassoni {nesta, matassoni}@fbk.eu Fondazione Bruno Kessler-Irst, Trento (ITALY) For contacts:
More informationarxiv: v1 [cs.cl] 23 Sep 2013
Feature Learning with Gaussian Restricted Boltzmann Machine for Robust Speech Recognition Xin Zheng 1,2, Zhiyong Wu 1,2,3, Helen Meng 1,3, Weifeng Li 1, Lianhong Cai 1,2 arxiv:1309.6176v1 [cs.cl] 23 Sep
More informationSession Variability Compensation in Automatic Speaker Recognition
Session Variability Compensation in Automatic Speaker Recognition Javier González Domínguez VII Jornadas MAVIR Universidad Autónoma de Madrid November 2012 Outline 1. The Inter-session Variability Problem
More informationMulticlass Discriminative Training of i-vector Language Recognition
Odyssey 214: The Speaker and Language Recognition Workshop 16-19 June 214, Joensuu, Finland Multiclass Discriminative Training of i-vector Language Recognition Alan McCree Human Language Technology Center
More informationAn Evolutionary Programming Based Algorithm for HMM training
An Evolutionary Programming Based Algorithm for HMM training Ewa Figielska,Wlodzimierz Kasprzak Institute of Control and Computation Engineering, Warsaw University of Technology ul. Nowowiejska 15/19,
More informationEmpirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model
Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model (most slides from Sharon Goldwater; some adapted from Philipp Koehn) 5 October 2016 Nathan Schneider
More informationA Low-Cost Robust Front-end for Embedded ASR System
A Low-Cost Robust Front-end for Embedded ASR System Lihui Guo 1, Xin He 2, Yue Lu 1, and Yaxin Zhang 2 1 Department of Computer Science and Technology, East China Normal University, Shanghai 200062 2 Motorola
More informationSpatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments
Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments Andreas Schwarz, Christian Huemmer, Roland Maas, Walter Kellermann Lehrstuhl für Multimediakommunikation
More informationUncertainty Decoding for Noise Robust Speech Recognition
Uncertainty Decoding for Noise Robust Speech Recognition Hank Liao Sidney Sussex College University of Cambridge September 2007 This dissertation is submitted for the degree of Doctor of Philosophy to
More informationEigenvoice Speaker Adaptation via Composite Kernel PCA
Eigenvoice Speaker Adaptation via Composite Kernel PCA James T. Kwok, Brian Mak and Simon Ho Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Hong Kong [jamesk,mak,csho]@cs.ust.hk
More information] Automatic Speech Recognition (CS753)
] Automatic Speech Recognition (CS753) Lecture 17: Discriminative Training for HMMs Instructor: Preethi Jyothi Sep 28, 2017 Discriminative Training Recall: MLE for HMMs Maximum likelihood estimation (MLE)
More informationRobust Speech Recognition in the Presence of Additive Noise. Svein Gunnar Storebakken Pettersen
Robust Speech Recognition in the Presence of Additive Noise Svein Gunnar Storebakken Pettersen A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of PHILOSOPHIAE DOCTOR
More informationBeyond Cross-entropy: Towards Better Frame-level Objective Functions For Deep Neural Network Training In Automatic Speech Recognition
INTERSPEECH 2014 Beyond Cross-entropy: Towards Better Frame-level Objective Functions For Deep Neural Network Training In Automatic Speech Recognition Zhen Huang 1, Jinyu Li 2, Chao Weng 1, Chin-Hui Lee
More informationSMALL-FOOTPRINT HIGH-PERFORMANCE DEEP NEURAL NETWORK-BASED SPEECH RECOGNITION USING SPLIT-VQ. Yongqiang Wang, Jinyu Li and Yifan Gong
SMALL-FOOTPRINT HIGH-PERFORMANCE DEEP NEURAL NETWORK-BASED SPEECH RECOGNITION USING SPLIT-VQ Yongqiang Wang, Jinyu Li and Yifan Gong Microsoft Corporation, One Microsoft Way, Redmond, WA 98052 {erw, jinyli,
More informationThe Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech
CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given
More informationGaussian Mixture Model Uncertainty Learning (GMMUL) Version 1.0 User Guide
Gaussian Mixture Model Uncertainty Learning (GMMUL) Version 1. User Guide Alexey Ozerov 1, Mathieu Lagrange and Emmanuel Vincent 1 1 INRIA, Centre de Rennes - Bretagne Atlantique Campus de Beaulieu, 3
More informationCURRENT state-of-the-art automatic speech recognition
1850 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 6, AUGUST 2007 Switching Linear Dynamical Systems for Noise Robust Speech Recognition Bertrand Mesot and David Barber Abstract
More informationTemporal Modeling and Basic Speech Recognition
UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing
More informationIntroduction to Neural Networks
Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationUniversity of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I
University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()
More informationTinySR. Peter Schmidt-Nielsen. August 27, 2014
TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),
More informationMaking Machines Understand Us in Reverberant Rooms [Robustness against reverberation for automatic speech recognition]
Making Machines Understand Us in Reverberant Rooms [Robustness against reverberation for automatic speech recognition] Yoshioka, T., Sehr A., Delcroix M., Kinoshita K., Maas R., Nakatani T., Kellermann
More informationSpeaker recognition by means of Deep Belief Networks
Speaker recognition by means of Deep Belief Networks Vasileios Vasilakakis, Sandro Cumani, Pietro Laface, Politecnico di Torino, Italy {first.lastname}@polito.it 1. Abstract Most state of the art speaker
More informationThis is an electronic reprint of the original article. This reprint may differ from the original in pagination and typographic detail.
Powered by TCPDF (www.tcpdf.org) This is an electronic reprint of the original article. This reprint may differ from the original in pagination and typographic detail. Author(s): Title: Heikki Kallasjoki,
More informationExpectation Propagation in Dynamical Systems
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex
More informationFoundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model
Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipop Koehn) 30 January
More informationJoint Factor Analysis for Speaker Verification
Joint Factor Analysis for Speaker Verification Mengke HU ASPITRG Group, ECE Department Drexel University mengke.hu@gmail.com October 12, 2012 1/37 Outline 1 Speaker Verification Baseline System Session
More informationAn Autoregressive Recurrent Mixture Density Network for Parametric Speech Synthesis
ICASSP 07 New Orleans, USA An Autoregressive Recurrent Mixture Density Network for Parametric Speech Synthesis Xin WANG, Shinji TAKAKI, Junichi YAMAGISHI National Institute of Informatics, Japan 07-03-07
More informationAutoregressive Neural Models for Statistical Parametric Speech Synthesis
Autoregressive Neural Models for Statistical Parametric Speech Synthesis シンワン Xin WANG 2018-01-11 contact: wangxin@nii.ac.jp we welcome critical comments, suggestions, and discussion 1 https://www.slideshare.net/kotarotanahashi/deep-learning-library-coyotecnn
More informationUncertainty Modeling without Subspace Methods for Text-Dependent Speaker Recognition
Uncertainty Modeling without Subspace Methods for Text-Dependent Speaker Recognition Patrick Kenny, Themos Stafylakis, Md. Jahangir Alam and Marcel Kockmann Odyssey Speaker and Language Recognition Workshop
More informationSNR Features for Automatic Speech Recognition
SNR Features for Automatic Speech Recognition Philip N. Garner Idiap Research Institute Martigny, Switzerland pgarner@idiap.ch Abstract When combined with cepstral normalisation techniques, the features
More informationEnvironmental Sound Classification in Realistic Situations
Environmental Sound Classification in Realistic Situations K. Haddad, W. Song Brüel & Kjær Sound and Vibration Measurement A/S, Skodsborgvej 307, 2850 Nærum, Denmark. X. Valero La Salle, Universistat Ramon
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models
Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationStatistical NLP Spring The Noisy Channel Model
Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationA Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More informationResidual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition
INTERSPEECH 017 August 0 4, 017, Stockholm, Sweden Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition Jaeyoung Kim 1, Mostafa El-Khamy 1, Jungwon Lee 1 1 Samsung Semiconductor,
More informationSupport Vector Machines using GMM Supervectors for Speaker Verification
1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:
More informationChapter 1 Gaussian Mixture Models
Chapter 1 Gaussian Mixture Models Abstract In this chapter we rst introduce the basic concepts of random variables and the associated distributions. These concepts are then applied to Gaussian random variables
More informationSparse Models for Speech Recognition
Sparse Models for Speech Recognition Weibin Zhang and Pascale Fung Human Language Technology Center Hong Kong University of Science and Technology Outline Introduction to speech recognition Motivations
More informationAn exploration of dropout with LSTMs
An exploration of out with LSTMs Gaofeng Cheng 1,3, Vijayaditya Peddinti 4,5, Daniel Povey 4,5, Vimal Manohar 4,5, Sanjeev Khudanpur 4,5,Yonghong Yan 1,2,3 1 Key Laboratory of Speech Acoustics and Content
More informationSingle Channel Signal Separation Using MAP-based Subspace Decomposition
Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,
More informationSpeaker Verification Using Accumulative Vectors with Support Vector Machines
Speaker Verification Using Accumulative Vectors with Support Vector Machines Manuel Aguado Martínez, Gabriel Hernández-Sierra, and José Ramón Calvo de Lara Advanced Technologies Application Center, Havana,
More informationComparing linear and non-linear transformation of speech
Comparing linear and non-linear transformation of speech Larbi Mesbahi, Vincent Barreaud and Olivier Boeffard IRISA / ENSSAT - University of Rennes 1 6, rue de Kerampont, Lannion, France {lmesbahi, vincent.barreaud,
More informationMonaural speech separation using source-adapted models
Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications
More informationLOW COMPLEXITY WIDEBAND LSF QUANTIZATION USING GMM OF UNCORRELATED GAUSSIAN MIXTURES
LOW COMPLEXITY WIDEBAND LSF QUANTIZATION USING GMM OF UNCORRELATED GAUSSIAN MIXTURES Saikat Chatterjee and T.V. Sreenivas Department of Electrical Communication Engineering Indian Institute of Science,
More informationDetection-Based Speech Recognition with Sparse Point Process Models
Detection-Based Speech Recognition with Sparse Point Process Models Aren Jansen Partha Niyogi Human Language Technology Center of Excellence Departments of Computer Science and Statistics ICASSP 2010 Dallas,
More informationA Direct Criterion Minimization based fmllr via Gradient Descend
A Direct Criterion Minimization based fmllr via Gradient Descend Jan Vaněk and Zbyněk Zajíc University of West Bohemia in Pilsen, Univerzitní 22, 306 14 Pilsen Faculty of Applied Sciences, Department of
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationRobust Speaker Identification
Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }
More informationCepstral normalisation and the signal to noise ratio spectrum in automatic speech recognition.
Cepstral normalisation and the signal to noise ratio spectrum in automatic speech recognition. Philip N. Garner Idiap Research Institute, Centre du Parc, Rue Marconi 9, PO Box 592, 92 Martigny, Switzerland
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationA Small Footprint i-vector Extractor
A Small Footprint i-vector Extractor Patrick Kenny Odyssey Speaker and Language Recognition Workshop June 25, 2012 1 / 25 Patrick Kenny A Small Footprint i-vector Extractor Outline Introduction Review
More informationarxiv: v4 [cs.cl] 5 Jun 2017
Multitask Learning with CTC and Segmental CRF for Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, and Noah A Smith Toyota Technological Institute at Chicago, USA School of Computer Science, Carnegie
More informationHarmonic Structure Transform for Speaker Recognition
Harmonic Structure Transform for Speaker Recognition Kornel Laskowski & Qin Jin Carnegie Mellon University, Pittsburgh PA, USA KTH Speech Music & Hearing, Stockholm, Sweden 29 August, 2011 Laskowski &
More informationReformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features
Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.
More informationExemplar-based voice conversion using non-negative spectrogram deconvolution
Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore
More informationModified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition System
INERSPEECH 2015 Modified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition System QingYang Hong 1, Lin Li 1, Ming Li 2, Ling Huang 1, Lihong Wan 1, Jun Zhang 1
More informationFEATURE PRUNING IN LIKELIHOOD EVALUATION OF HMM-BASED SPEECH RECOGNITION. Xiao Li and Jeff Bilmes
FEATURE PRUNING IN LIKELIHOOD EVALUATION OF HMM-BASED SPEECH RECOGNITION Xiao Li and Jeff Bilmes Department of Electrical Engineering University. of Washington, Seattle {lixiao, bilmes}@ee.washington.edu
More informationA Comparative Study of Histogram Equalization (HEQ) for Robust Speech Recognition
Computational Linguistics and Chinese Language Processing Vol. 12, No. 2, June 2007, pp. 217-238 217 The Association for Computational Linguistics and Chinese Language Processing A Comparative Study of
More informationi-vector and GMM-UBM Bie Fanhu CSLT, RIIT, THU
i-vector and GMM-UBM Bie Fanhu CSLT, RIIT, THU 2013-11-18 Framework 1. GMM-UBM Feature is extracted by frame. Number of features are unfixed. Gaussian Mixtures are used to fit all the features. The mixtures
More informationREGULARIZING DNN ACOUSTIC MODELS WITH GAUSSIAN STOCHASTIC NEURONS. Hao Zhang, Yajie Miao, Florian Metze
REGULARIZING DNN ACOUSTIC MODELS WITH GAUSSIAN STOCHASTIC NEURONS Hao Zhang, Yajie Miao, Florian Metze Language Technologies Institute, School of Computer Science, Carnegie Mellon University Pittsburgh,
More informationA TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme
A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY MengSun,HugoVanhamme Department of Electrical Engineering-ESAT, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Bus
More informationModel-based Approaches to Robust Speech Recognition in Diverse Environments
Model-based Approaches to Robust Speech Recognition in Diverse Environments Yongqiang Wang Darwin College Engineering Department Cambridge University October 2015 This dissertation is submitted to the
More informationMVA Processing of Speech Features. Chia-Ping Chen, Jeff Bilmes
MVA Processing of Speech Features Chia-Ping Chen, Jeff Bilmes {chiaping,bilmes}@ee.washington.edu SSLI Lab Dept of EE, University of Washington Seattle, WA - UW Electrical Engineering UWEE Technical Report
More informationIntroduction to SVM and RVM
Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance
More informationFront-End Factor Analysis For Speaker Verification
IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING Front-End Factor Analysis For Speaker Verification Najim Dehak, Patrick Kenny, Réda Dehak, Pierre Dumouchel, and Pierre Ouellet, Abstract This
More information2D Spectrogram Filter for Single Channel Speech Enhancement
Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 007 89 D Spectrogram Filter for Single Channel Speech Enhancement HUIJUN DING,
More informationGlobal SNR Estimation of Speech Signals using Entropy and Uncertainty Estimates from Dropout Networks
Interspeech 2018 2-6 September 2018, Hyderabad Global SNR Estimation of Speech Signals using Entropy and Uncertainty Estimates from Dropout Networks Rohith Aralikatti, Dilip Kumar Margam, Tanay Sharma,
More informationTowards Maximum Geometric Margin Minimum Error Classification
THE SCIENCE AND ENGINEERING REVIEW OF DOSHISHA UNIVERSITY, VOL. 50, NO. 3 October 2009 Towards Maximum Geometric Margin Minimum Error Classification Kouta YAMADA*, Shigeru KATAGIRI*, Erik MCDERMOTT**,
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More information