Deep Learning for Automatic Speech Recognition Part I

Size: px
Start display at page:

Download "Deep Learning for Automatic Speech Recognition Part I"

Transcription

1 Deep Learning for Automatic Speech Recognition Part I Xiaodong Cui IBM T. J. Watson Research Center Yorktown Heights, NY Fall, 2018

2 Outline A brief history of automatic speech recognition Speech production model Fundamentals of automatic speech recognition Context-dependent deep neural networks EECS 6894, Columbia University 2/50

3 Applications of Speech Technologies Speech recognition Speech synthesis Voice conversion Speech enhancement Speech coding Spoken term detection Speaker recognition/verification Speech-to-speech translation Dialogue systems... EECS 6894, Columbia University 3/50

4 Some ASR Terminology Speaker dependent (SD) vs. speaker independent (SI) Isolated word recognition vs. continuous speech recognition Large-vocabulary continuous speech recognition (LVCSR) naturally speaking style > 1000 words historically but way more nowadays (e.g. 30K - 50K, some may reach 100K) Speaker adaptation supervised unsupervised Speech input and channel close-talking microphone far-field microphone microphone array codec EECS 6894, Columbia University 4/50

5 Roughly four generations: Four Generations of ASR 1st generation (1950s-1960s): Explorative work based on acoustic-phonetics 2nd generation (1960s-1970s): ASR based on template matching 3rd generation (1970s-2000s): ASR based on rigorous statistical modeling 4th generation (late 2000s-present): ASR based on deep learning *adapted from S. Furui, History and development of Speech Recognition." EECS 6894, Columbia University 5/50

6 1st Generation: Early Attempts (1) K. H. Davis, R. Biddulph, and S. Balashek, Automatic Recognition of Spoken Digits," J. Acoust. Soc. Am., vol 24, no. 6, pp , EECS 6894, Columbia University 6/50

7 1st Generation: Early Attempts (2) K. H. Davis, R. Biddulph, and S. Balashek, Automatic Recognition of Spoken Digits," J. Acoust. Soc. Am., vol 24, no. 6, pp , EECS 6894, Columbia University 7/50

8 2nd Generation: Template Matching Linear predictive coding (LPC) formulated independently by Atal and Itakura an effective way for estimation of vocal tract response Dynamic programming widely known as dynamic time warping (DTW) in speech community deal with non-uniformity of two patterns first proposed by Vintsyuk from former Soviet Union which was not known to western countries until 1980s. Sakoe and Chiba from Japan independently proposed it in late 1970s. Isolated-word or connected-word recognition based on DTW and LPC (and its variants) under appropriate distance measure. EECS 6894, Columbia University 8/50

9 An example illustrating DTW reference pattern test pattern EECS 6894, Columbia University 9/50

10 3rd Generation: Hidden Markov Models Switched from template-based approaches to rigorous statistical modeling Path of HMMs becoming the dominant approach in ASR Earliest research dated back to late 1960s by L. E. Baum and his colleagues at Institute for Defense Analyses (IDA) James Baker followed it up at CMU when he was a Ph.D in 1970s James and Janet Baker joined IBM and worked with Fred Jelinek on using HMMs for speech recognition in 1970s Workshop on HMMs was held by IDA in 1980 which resulted in a so-called "The Blue Book" with the title "Applications of Hidden Markov Models to Text and Speech". But the book was never widely distributed. A series of papers on the HMM methodology was published after the IDA workshop in the next few years including the well-known IEEE proceedings paper "A tutorial on hidden Markov models and selected applications in speech recognition" in HMMs have since become the dominant approach for speech recognition. EECS 6894, Columbia University 10/50

11 A Unified Speech Recognition Model P (W X) = P (X W )P (W ) P (X) X = {x 1, x 2,, x m } W = {w 1, w 2,, w n } is a sequence of speech features is a sequence of words P (W ) = P (w 1, w 2,, w n ) gives the probability of the sequence of the words referred to as language model (LM) P (X W ) = P (x 1, x 2,, x m w 1, w 2,, w n ) gives the probability of the sequence of speech features given the word sequence referred to as acoustic model (AM) EECS 6894, Columbia University 11/50

12 An Illustrative Example of HMMs SIL SIL-HH+AW HH-AW+AA AW-AA+R AA-R+Y R-Y+UW Y-UW+SIL SIL SIL HH AW AA R Y UW SIL How are you AW-AA+R EECS 6894, Columbia University 12/50

13 Two LVCSR Developments at IBM and Bell Labs IBM focused on dictation systems interested in seeking a probabilistic structure of the language model with a large vocabulary n-grams P (W ) = P (w 1 w 2 w n ) = P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w n w 1 w 2 w n 1 ) = P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w n w n 1 ) Bell Labs focused on voice dialing and command and control for call routing interested in speaker independent systems that could handle acoustic variability from a large number of different speakers Gaussian mixture models (GMMs) for state observation distribution P (x s) = i c i N (x; µ i, Σ i ) EECS 6894, Columbia University 13/50

14 3rd Generation: Further Improvements GMM-HMM acoustic model with n-gram language model have become the "standard" LVCSR recipe. Significant improvements in 1990s and 2000s. Speaker adaptation/normalization vocal tract length normalization (VTLN) maximum likelihood linear regression (MLLR) speaker adaptive training (SAT) Noise robustness parallel model combination (PMC) vector Taylor series (VTS) Discriminative training minimum classification error (MCE) maximum mutual information (MMI) minimum phone error (MPE) large margin EECS 6894, Columbia University 14/50

15 Progresses Made Before the Advent of Deep Learning *X. Huang, J. Baker and R. Reddy, A historical perspective of speech recognition." EECS 6894, Columbia University 15/50

16 4th Generation: Deep Learning G. E. Hinton, S. Osindero and Y.-W. Teh, "A Fast Learning Algorithm for Deep Belief Nets," Neural Computation, 18, pp , the ground-breaking paper on deep learning. layer-wise pre-training in an unsupervised fashion fine-tune afterwards with supervised training Changed people s mindset that deep neural networks are not good and can never be trained Microsoft, Goolge and IBM around 2011 and 2012 all reported significant improvements over their then state-of-the-art GMM-HMM-based ASR systems. EECS 6894, Columbia University 16/50

17 4th Generation: Deep Learning Deep acoustic modeling deep feedforward neural networks (DNN, CNN, ) deep recurrent neural networks (LSTM, GRU, ) end-to-end neural networks Deep language modeling deep feedforward neural networks (DNN, CNN) deep recurrent neural networks (RNNs, LSTM ) word embedding EECS 6894, Columbia University 17/50

18 DNN-HMM vs. GMM-HMM *G. Hinton et. al., Deep neural networks for acoustic modeling in speech recognition the shared views of four research groups." EECS 6894, Columbia University 18/50

19 Impact of Deep Learning on SWB2000 Switchboard database a popular public benchmark in speech recognition community human-human landline telephone conversations on directed topics 300 hours/ 2000 hour EECS 6894, Columbia University 19/50

20 Progresses Made at IBM on SWB2000 *estimate of human performance: 5.1% EECS 6894, Columbia University 20/50

21 Achieving "Human Parity" in ASR EECS 6894, Columbia University 21/50

22 Speech Production Model *from internet EECS 6894, Columbia University 22/50

23 Speech Production Model *L. Rabiner and B.-H. Juang, "Fundamentals of speech recognition". EECS 6894, Columbia University 23/50

24 Speech Production Model impulse train voiced sounds unvoiced sounds vocal tract speech signal white noise G time domain: frequency domain: s t = e(t) v(t) S(ω) = E(ω)V (ω) EECS 6894, Columbia University 24/50

25 Speech Production Model *J. Picone, "Fundamentals of speech recognition: a short course". EECS 6894, Columbia University 25/50

26 Speech Production Model *J. Picone, "Fundamentals of speech recognition: a short course". EECS 6894, Columbia University 26/50

27 An example of speech spectra /uw/ sound from an adult male and a boy pitch vocal tract EECS 6894, Columbia University 27/50

28 "deep learning for computer vision, speech and language" -- narrowband spectrogram time Waveforms and Spectrograms "deep learning for computer vision, speech and language" -- wideband spectrogram 10 4 frequency time frequency EECS 6894, Columbia University 28/50

29 Wideband and Narrowband Spectrograms 8000 "deep learning" -- wideband spectrogram 8000 "deep learning" -- narrowband spectrogram frequency 4000 frequency time time EECS 6894, Columbia University 29/50

30 A Quick Walkthrough of GMM-HMM Based Speech Recognition speech input feature extraction decoding network text output acoustic model language model dictionary training decoding EECS 6894, Columbia University 30/50

31 Feature Extraction Frame window length 20ms with a shift of 10ms Commonly used hand-crafted features Linear Predictive Cepstral Coefficients (LPCCs) Mel-frequency Cepstral Coefficients (MFCCs) Perceptual Linear Predictive (PLP) analysis Mel-frequency Filter banks (FBanks) (widely used in DNNs/CNNs) EECS 6894, Columbia University 31/50

32 Cepstral Analysis Why cepstral analysis? deconvolution! windowing DFT log IDFT liftering S(ω) = E(ω) V (ω) log S(ω) = log E(ω) + log V (ω) c(n) = IDFT(log S(ω) ) = IDFT(log E(ω) + log V (ω) ) cepstrum, quefrency and liftering vocal tract components are represented by the slowly varying components concentrated at lower quefrency excitation components are represented by the fast varying components concentrated at the higher quefrency EECS 6894, Columbia University 32/50

33 Mel-Frequency Filter Bank and MFCC DFT MEL Log DCT MFCC EECS 6894, Columbia University 33/50

34 Dictionary HOW HH AW ARE AA R YOU Y UW Acoustic Units and Dictionary Context-Independent (CI) phonemes HH AW AA R Y UW Context-dependent (CD) phonemes coarticulation phonemes are different if they have different left and right contexts P l P c + P r e.g. HH-AW+AA, P-AW+AA, HH-AW+B are different CD phonemes each CD phoneme is model by an HMM. Other acoustic units syllables, words, a tradeoff between modeling accuracy and data coverage EECS 6894, Columbia University 34/50

35 Context-Dependent Phonemes Advantages: modeling subtle acoustic characteristics given the acoustic contexts Disadvantages: giving rise to a substantial number of CD phonemes e.g = 91125, Solution: parameter tying (vowels, stops, fricatives, nasals...) widely used for targets of DNN/CNN systems *after HTK. EECS 6894, Columbia University 35/50

36 GMM-HMM: Mathematical Formulation λ = {A, B, π} State transition probability A: a ij = P (s t+1 = j s t = i) State observation PDF B: b i (o t ) = p(o t s t = i) Initial state distribution π: π i = P (s 1 = i) HMM is an extension of Markov chain a doubly embedded stochastic process an underlying stochastic process which is not directly observable another observable stochastic process generated from the hidden stochastic process EECS 6894, Columbia University 36/50

37 GMM-HMM: Mathematical Formulation SIL SIL-HH+AW HH-AW+AA AW-AA+R AA-R+Y R-Y+UW Y-UW+SIL SIL SIL HH AW AA R Y UW SIL How are you Three fundamental problems for HMMs Given the observed sequence O = {o 1, o T } and a model λ = {A, B, π}, how to evaluate the probability of the observation sequence P (O λ) = S P (O, S λ)? How do we adjust the model parameters λ = {A, B, π} to maximize P (O λ)? Given the observed sequence O = {o 1, o T } and the model λ = {A, B, π}, how to choose the most likely state sequence S = {s 1, s T }? EECS 6894, Columbia University 37/50

38 GMM-HMM: Acoustic Model Training How to compute the feature sequence likelihood given the acoustic model? The forward-backward algorithm Forward Computation: Initialization Induction Termination α t (i) = P (o 1 o t, s t = i λ) α 1(i) = π ib i(o 1), 1 i M M α t+1(j) = α t(i)a ijb j(o t+1), 1 j M, 1 t T 1 i=1 M P (O λ) = α T (i) Backward Computation: β t (i) = P (o t+1 o T s t = i, λ) Initialization β T (i) = 1, 1 i M Induction M β t(i) = a ijb j(o t+1)β t+1(j), 1 j M, T 1 t 1 i=1 i=1 Using both forward and backward variables: P (O λ) = M i=1 α t (i)β t (i) EECS 6894, Columbia University 38/50

39 GMM-HMM: Acoustic Model Training How to estimate model parameters λ given the data? Given the feature sequence O = {o 1, o T} where GMM is used for B: λ = argmax P (O λ) λ P (x s) = i c i N (x; µ i, Σ i) the Expectation-Maximization (EM) algorithm (also known as Baum-Welch algorithm) where t c ik = γ ik(t) t γi(t) µ ik = t γ ik(t)o t t γ ik(t), Σ ik = t γ ik(t)(o t µ ik )(o t µ ik ) T t γ ik(t) α t(i)β t(i) γ t(i) = P (s t = i O, λ) = Mi=1 α t(i)β t(i) α t(i)a ijb j(o t+1)β t+1(j) γ ik (t) = P (s t = i, ζ t = k O, λ) = Mi=1 Mj=1 α t(i)a ijb j(o t+1)β t+1(j) EECS 6894, Columbia University 39/50

40 GMM-HMM: Language Model Training n-gram: Approximate the conditional probability with n history words P (w 1, w 2,, w m ) m P (w i w i n+1,, w i 1 ) i=1 Counting events on context using training data P (w i w i n+1,, w i 1 ) = C(w i n+1,, w i 1, w i ) C(w i n+1,, w i 1 ) unigram, bigram, trigram, 4-grams, data sparsity issue back-off strategy and interpolation EECS 6894, Columbia University 40/50

41 GMM-HMM: Decoding How to find the path of a feature sequence that gives the maximum likelihood given the model? dynamic programming (Viterbi decoding) let φ j (t) represent the maximum likelihood of observing partial sequence from o 1 to o t and being in state j at time t by induction, φ j (t) = max s1,s2, st 1 P (s 1, s 2, s t 1, s t = j, o 1, o 2, o t λ) φ j (t) = max{φ i (t 1)a ij }b j (o t ) i *after HTK. EECS 6894, Columbia University 41/50

42 GMM-HMM: Decoding are you S sil how it sil E is car your Inject language model Inject dictionary Inject CD-HMMs with each CD-HMM state having a GMM distribution Compute (log-)likelihood of each feature vector in each CD-HMM state P (x s) = i c i N (x; µ i, Σ i ) EECS 6894, Columbia University 42/50

43 GMM-HMM: Decoding P(are how) P(how sil) P(you are) P(car your) P(is how) AA-B-1 AA-B-2 AA-B-3 AA-M-1 AA-M-2 AA-M-3 AA-M-4 AA-E-1 AA-E-2 SIL-E-7 SIL-E-8 SIL-E-9 GMMs scorer EECS 6894, Columbia University 43/50

44 GMM-HMM: Forced Alignment SIL SIL-HH+AW HH-AW+AA AW-AA+R AA-R+Y R-Y+UW Y-UW+SIL SIL How are you Given the text label, how to find the best underlying state sequence? same as decoding except the label is known Viterbi algorithm (again) Often referred to as Viterbi alignments in speech community Widely used in deep learning ASR to generate the target labels for the training data EECS 6894, Columbia University 44/50

45 Context-Dependent DNNs P(are how) P(how sil) P(you are) P(car your) P(is how) AA-B-1 AA-B-2 AA-B-3 AA-M-1 AA-M-2 AA-M-3 AA-M-4 AA-E-1 AA-E-2 SIL-E-7 SIL-E-8 SIL-E-9 GMMs scorer DNN EECS 6894, Columbia University 45/50

46 Context-Dependent DNNs AA-B-1 AA-B-2 AA-B-3 AA-M-1 AA-M-2 AA-M-3 AA-M-4 AA-E-1 AA-E-2 SIL-E-7 SIL-E-8 SIL-E-9 P (o s) = p(s o)p(o) p(s) p(s o) p(s) EECS 6894, Columbia University 46/50

47 Two Families of DNN-HMM Acoustic Modeling Hybrid systems directly connected to HMMs for acoustic modeling Bottleneck tandem systems used as feature extractors bottleneck features extracted can be used to train GMM-HMMs or DNN-HMMs EECS 6894, Columbia University 47/50

48 Modeling with Hidden Variables Hidden variables (or latent variables) are crucial in acoustic modeling (also true for computer vision and NLP) hidden state or phone sequence in acoustic modeling hidden speaker transformation in speaker adaptation latent topic and word distribution in Latent Dirichlet allocation in NLP It reflects your belief how a system works (internal working mechanism) Deep neural networks using a large number of hidden variables hidden variables are organized in a hierarchical fashion usually lack of straightforward interpretation (the well-known interpretability issue) EECS 6894, Columbia University 48/50

49 DNN-HMMs: An Typical Training Recipe Preparation 40-dim FBank features with ±4 adjacent frames (input dim = 40x9) use an existing model to generate alignments which are then converted to 1-hot targets from each frame create training and validation data sets estimate priors p(s) of CD states Training set DNN configuration (multiple hidden layers, softmax output layers and cross-entropy loss function) initialization optimization based on back-prop using SGD on the training set while monitoring loss on the validation set Test push input features from test utterances through the DNN acoustic model to get their posteriors convert posteriors to likelihoods Viterbi decoding on the decoding network measure the word error rate EECS 6894, Columbia University 49/50

50 Training A Hybrid DNN-HMM System Attila/Kaldi Theano/Torch/ Tensorflow Attila/Kaldi feature extraction CD-phone generation GMM-HMM CD-phones (classes) alignments (labels) features DNN-HMM posteriors HMM Viterbi Decoding EECS 6894, Columbia University 50/50

Deep Learning for Automatic Speech Recognition Part II

Deep Learning for Automatic Speech Recognition Part II Deep Learning for Automatic Speech Recognition Part II Xiaodong Cui IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Fall, 2018 Outline A brief revisit of sampling, pitch/formant and MFCC DNN-HMM

More information

Lecture 5: GMM Acoustic Modeling and Feature Extraction

Lecture 5: GMM Acoustic Modeling and Feature Extraction CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic

More information

Hidden Markov Modelling

Hidden Markov Modelling Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models

More information

Deep Learning for Speech Recognition. Hung-yi Lee

Deep Learning for Speech Recognition. Hung-yi Lee Deep Learning for Speech Recognition Hung-yi Lee Outline Conventional Speech Recognition How to use Deep Learning in acoustic modeling? Why Deep Learning? Speaker Adaptation Multi-task Deep Learning New

More information

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave

More information

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.

More information

Lecture 9: Speech Recognition. Recognizing Speech

Lecture 9: Speech Recognition. Recognizing Speech EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/

More information

Hidden Markov Model and Speech Recognition

Hidden Markov Model and Speech Recognition 1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed

More information

Lecture 9: Speech Recognition

Lecture 9: Speech Recognition EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis

More information

Temporal Modeling and Basic Speech Recognition

Temporal Modeling and Basic Speech Recognition UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Statistical NLP Spring The Noisy Channel Model

Statistical NLP Spring The Noisy Channel Model Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short

More information

Speech Signal Representations

Speech Signal Representations Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6

More information

Lecture 3: ASR: HMMs, Forward, Viterbi

Lecture 3: ASR: HMMs, Forward, Viterbi Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The

More information

SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS

SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS by Jinjin Ye, B.S. A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS for

More information

Environmental and Speaker Robustness in Automatic Speech Recognition with Limited Learning Data

Environmental and Speaker Robustness in Automatic Speech Recognition with Limited Learning Data University of California Los Angeles Environmental and Speaker Robustness in Automatic Speech Recognition with Limited Learning Data A dissertation submitted in partial satisfaction of the requirements

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details

More information

ASR using Hidden Markov Model : A tutorial

ASR using Hidden Markov Model : A tutorial ASR using Hidden Markov Model : A tutorial Samudravijaya K Workshop on ASR @BAMU; 14-OCT-11 samudravijaya@gmail.com Tata Institute of Fundamental Research Samudravijaya K Workshop on ASR @BAMU; 14-OCT-11

More information

Estimation of Cepstral Coefficients for Robust Speech Recognition

Estimation of Cepstral Coefficients for Robust Speech Recognition Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment

More information

] Automatic Speech Recognition (CS753)

] Automatic Speech Recognition (CS753) ] Automatic Speech Recognition (CS753) Lecture 17: Discriminative Training for HMMs Instructor: Preethi Jyothi Sep 28, 2017 Discriminative Training Recall: MLE for HMMs Maximum likelihood estimation (MLE)

More information

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Dr Philip Jackson Centre for Vision, Speech & Signal Processing University of Surrey, UK 1 3 2 http://www.ee.surrey.ac.uk/personal/p.jackson/isspr/ Outline 1. Recognizing patterns

More information

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course) 10. Hidden Markov Models (HMM) for Speech Processing (some slides taken from Glass and Zue course) Definition of an HMM The HMM are powerful statistical methods to characterize the observed samples of

More information

Statistical NLP Spring Digitizing Speech

Statistical NLP Spring Digitizing Speech Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon

More information

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ... Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield

More information

Deep Neural Networks

Deep Neural Networks Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 21: Speaker Adaptation Instructor: Preethi Jyothi Oct 23, 2017 Speaker variations Major cause of variability in speech is the differences between speakers Speaking

More information

Why DNN Works for Acoustic Modeling in Speech Recognition?

Why DNN Works for Acoustic Modeling in Speech Recognition? Why DNN Works for Acoustic Modeling in Speech Recognition? Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Joint work with Y. Bao, J. Pan,

More information

1. Markov models. 1.1 Markov-chain

1. Markov models. 1.1 Markov-chain 1. Markov models 1.1 Markov-chain Let X be a random variable X = (X 1,..., X t ) taking values in some set S = {s 1,..., s N }. The sequence is Markov chain if it has the following properties: 1. Limited

More information

Feature extraction 1

Feature extraction 1 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter

More information

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm + September13, 2016 Professor Meteer CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm Thanks to Dan Jurafsky for these slides + ASR components n Feature

More information

Augmented Statistical Models for Speech Recognition

Augmented Statistical Models for Speech Recognition Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent

More information

Presented By: Omer Shmueli and Sivan Niv

Presented By: Omer Shmueli and Sivan Niv Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan

More information

On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System

On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System Fabrice Lefèvre, Claude Montacié and Marie-José Caraty Laboratoire d'informatique de Paris VI 4, place Jussieu 755 PARIS

More information

Signal representations: Cepstrum

Signal representations: Cepstrum Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,

More information

Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics

Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Phil Woodland: pcw@eng.cam.ac.uk Lent 2013 Engineering Part IIB: Module 4F11 What is Speech Recognition?

More information

Statistical Sequence Recognition and Training: An Introduction to HMMs

Statistical Sequence Recognition and Training: An Introduction to HMMs Statistical Sequence Recognition and Training: An Introduction to HMMs EECS 225D Nikki Mirghafori nikki@icsi.berkeley.edu March 7, 2005 Credit: many of the HMM slides have been borrowed and adapted, with

More information

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Lecture Notes Speech Communication 2, SS 2004 Erhard Rank/Franz Pernkopf Signal Processing and Speech Communication Laboratory Graz University of Technology Inffeldgasse 16c, A-8010

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides mostly from Mitch Marcus and Eric Fosler (with lots of modifications). Have you seen HMMs? Have you seen Kalman filters? Have you seen dynamic programming? HMMs are dynamic

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 8: Tied state HMMs + DNNs in ASR Instructor: Preethi Jyothi Aug 17, 2017 Final Project Landscape Voice conversion using GANs Musical Note Extraction Keystroke

More information

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()

More information

Hidden Markov Models Hamid R. Rabiee

Hidden Markov Models Hamid R. Rabiee Hidden Markov Models Hamid R. Rabiee 1 Hidden Markov Models (HMMs) In the previous slides, we have seen that in many cases the underlying behavior of nature could be modeled as a Markov process. However

More information

Discriminative models for speech recognition

Discriminative models for speech recognition Discriminative models for speech recognition Anton Ragni Peterhouse University of Cambridge A thesis submitted for the degree of Doctor of Philosophy 2013 Declaration This dissertation is the result of

More information

Automatic Phoneme Recognition. Segmental Hidden Markov Models

Automatic Phoneme Recognition. Segmental Hidden Markov Models Automatic Phoneme Recognition with Segmental Hidden Markov Models Areg G. Baghdasaryan Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment

More information

Robust Speaker Identification

Robust Speaker Identification Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }

More information

Using Sub-Phonemic Units for HMM Based Phone Recognition

Using Sub-Phonemic Units for HMM Based Phone Recognition Jarle Bauck Hamar Using Sub-Phonemic Units for HMM Based Phone Recognition Thesis for the degree of Philosophiae Doctor Trondheim, June 2013 Norwegian University of Science and Technology Faculty of Information

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Hidden Markov Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 33 Introduction So far, we have classified texts/observations

More information

Unfolded Recurrent Neural Networks for Speech Recognition

Unfolded Recurrent Neural Networks for Speech Recognition INTERSPEECH 2014 Unfolded Recurrent Neural Networks for Speech Recognition George Saon, Hagen Soltau, Ahmad Emami and Michael Picheny IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598 gsaon@us.ibm.com

More information

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY MengSun,HugoVanhamme Department of Electrical Engineering-ESAT, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Bus

More information

Sparse Models for Speech Recognition

Sparse Models for Speech Recognition Sparse Models for Speech Recognition Weibin Zhang and Pascale Fung Human Language Technology Center Hong Kong University of Science and Technology Outline Introduction to speech recognition Motivations

More information

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers

More information

Jorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function

Jorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function 890 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Average Divergence Distance as a Statistical Discrimination Measure for Hidden Markov Models Jorge Silva and Shrikanth

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition

Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition Jorge Silva and Shrikanth Narayanan Speech Analysis and Interpretation

More information

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II)

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II) Speech and Language Processing Chapter 9 of SLP Automatic Speech Recognition (II) Outline for ASR ASR Architecture The Noisy Channel Model Five easy pieces of an ASR system 1) Language Model 2) Lexicon/Pronunciation

More information

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu

More information

An Evolutionary Programming Based Algorithm for HMM training

An Evolutionary Programming Based Algorithm for HMM training An Evolutionary Programming Based Algorithm for HMM training Ewa Figielska,Wlodzimierz Kasprzak Institute of Control and Computation Engineering, Warsaw University of Technology ul. Nowowiejska 15/19,

More information

Robust Speech Recognition in the Presence of Additive Noise. Svein Gunnar Storebakken Pettersen

Robust Speech Recognition in the Presence of Additive Noise. Svein Gunnar Storebakken Pettersen Robust Speech Recognition in the Presence of Additive Noise Svein Gunnar Storebakken Pettersen A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of PHILOSOPHIAE DOCTOR

More information

Dynamic Time-Alignment Kernel in Support Vector Machine

Dynamic Time-Alignment Kernel in Support Vector Machine Dynamic Time-Alignment Kernel in Support Vector Machine Hiroshi Shimodaira School of Information Science, Japan Advanced Institute of Science and Technology sim@jaist.ac.jp Mitsuru Nakai School of Information

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

A New OCR System Similar to ASR System

A New OCR System Similar to ASR System A ew OCR System Similar to ASR System Abstract Optical character recognition (OCR) system is created using the concepts of automatic speech recognition where the hidden Markov Model is widely used. Results

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Hidden Markov Models in Language Processing

Hidden Markov Models in Language Processing Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications

More information

Hierarchical Multi-Stream Posterior Based Speech Recognition System

Hierarchical Multi-Stream Posterior Based Speech Recognition System Hierarchical Multi-Stream Posterior Based Speech Recognition System Hamed Ketabdar 1,2, Hervé Bourlard 1,2 and Samy Bengio 1 1 IDIAP Research Institute, Martigny, Switzerland 2 Ecole Polytechnique Fédérale

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Statistical Processing of Natural Language

Statistical Processing of Natural Language Statistical Processing of Natural Language and DMKM - Universitat Politècnica de Catalunya and 1 2 and 3 1. Observation Probability 2. Best State Sequence 3. Parameter Estimation 4 Graphical and Generative

More information

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Kumari Rambha Ranjan, Kartik Mahto, Dipti Kumari,S.S.Solanki Dept. of Electronics and Communication Birla

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Lecturer: Stefanos Zafeiriou Goal (Lectures): To present discrete and continuous valued probabilistic linear dynamical systems (HMMs

More information

Lecture 11: Hidden Markov Models

Lecture 11: Hidden Markov Models Lecture 11: Hidden Markov Models Cognitive Systems - Machine Learning Cognitive Systems, Applied Computer Science, Bamberg University slides by Dr. Philip Jackson Centre for Vision, Speech & Signal Processing

More information

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models. Say Wei Foo, Yong Lian, Liang Dong. IEEE Transactions on Circuits and Systems for Video Technology, May 2004. Shankar

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

FACTORIAL HMMS FOR ACOUSTIC MODELING. Beth Logan and Pedro Moreno

FACTORIAL HMMS FOR ACOUSTIC MODELING. Beth Logan and Pedro Moreno ACTORIAL HMMS OR ACOUSTIC MODELING Beth Logan and Pedro Moreno Cambridge Research Laboratories Digital Equipment Corporation One Kendall Square, Building 700, 2nd loor Cambridge, Massachusetts 02139 United

More information

Hidden Markov Models. Dr. Naomi Harte

Hidden Markov Models. Dr. Naomi Harte Hidden Markov Models Dr. Naomi Harte The Talk Hidden Markov Models What are they? Why are they useful? The maths part Probability calculations Training optimising parameters Viterbi unseen sequences Real

More information

Chapter 9 Automatic Speech Recognition DRAFT

Chapter 9 Automatic Speech Recognition DRAFT P R E L I M I N A R Y P R O O F S. Unpublished Work c 2008 by Pearson Education, Inc. To be published by Pearson Prentice Hall, Pearson Education, Inc., Upper Saddle River, New Jersey. All rights reserved.

More information

Acoustic Modeling for Speech Recognition

Acoustic Modeling for Speech Recognition Acoustic Modeling for Speech Recognition Berlin Chen 2004 References:. X. Huang et. al. Spoken Language Processing. Chapter 8 2. S. Young. The HTK Book (HTK Version 3.2) Introduction For the given acoustic

More information

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013 More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative

More information

Bayesian Hidden Markov Models and Extensions

Bayesian Hidden Markov Models and Extensions Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

Hidden Markov model. Jianxin Wu. LAMDA Group National Key Lab for Novel Software Technology Nanjing University, China

Hidden Markov model. Jianxin Wu. LAMDA Group National Key Lab for Novel Software Technology Nanjing University, China Hidden Markov model Jianxin Wu LAMDA Group National Key Lab for Novel Software Technology Nanjing University, China wujx2001@gmail.com May 7, 2018 Contents 1 Sequential data and the Markov property 2 1.1

More information

Speech Recognition HMM

Speech Recognition HMM Speech Recognition HMM Jan Černocký, Valentina Hubeika {cernocky ihubeika}@fit.vutbr.cz FIT BUT Brno Speech Recognition HMM Jan Černocký, Valentina Hubeika, DCGM FIT BUT Brno 1/38 Agenda Recap variability

More information

FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION

FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION Sarika Hegde 1, K. K. Achary 2 and Surendra Shetty 3 1 Department of Computer Applications, NMAM.I.T., Nitte, Karkala Taluk,

More information

Discriminative training of GMM-HMM acoustic model by RPCL type Bayesian Ying-Yang harmony learning

Discriminative training of GMM-HMM acoustic model by RPCL type Bayesian Ying-Yang harmony learning Discriminative training of GMM-HMM acoustic model by RPCL type Bayesian Ying-Yang harmony learning Zaihu Pang 1, Xihong Wu 1, and Lei Xu 1,2 1 Speech and Hearing Research Center, Key Laboratory of Machine

More information

TinySR. Peter Schmidt-Nielsen. August 27, 2014

TinySR. Peter Schmidt-Nielsen. August 27, 2014 TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),

More information

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named We Live in Exciting Times ACM (an international computing research society) has named CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Apr. 2, 2019 Yoshua Bengio,

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

Exemplar-based voice conversion using non-negative spectrogram deconvolution

Exemplar-based voice conversion using non-negative spectrogram deconvolution Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore

More information

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipop Koehn) 30 January

More information

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation Keiichi Tokuda Nagoya Institute of Technology Carnegie Mellon University Tamkang University March 13,

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data Statistical Machine Learning from Data Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne (EPFL),

More information

Multiscale Systems Engineering Research Group

Multiscale Systems Engineering Research Group Hidden Markov Model Prof. Yan Wang Woodruff School of Mechanical Engineering Georgia Institute of echnology Atlanta, GA 30332, U.S.A. yan.wang@me.gatech.edu Learning Objectives o familiarize the hidden

More information

Adaptive Boosting for Automatic Speech Recognition. Kham Nguyen

Adaptive Boosting for Automatic Speech Recognition. Kham Nguyen Adaptive Boosting for Automatic Speech Recognition by Kham Nguyen Submitted to the Electrical and Computer Engineering Department, Northeastern University in partial fulfillment of the requirements for

More information

NEURAL LANGUAGE MODELS

NEURAL LANGUAGE MODELS COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,

More information

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction

More information

Comparing linear and non-linear transformation of speech

Comparing linear and non-linear transformation of speech Comparing linear and non-linear transformation of speech Larbi Mesbahi, Vincent Barreaud and Olivier Boeffard IRISA / ENSSAT - University of Rennes 1 6, rue de Kerampont, Lannion, France {lmesbahi, vincent.barreaud,

More information

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 Hidden Markov Model Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/19 Outline Example: Hidden Coin Tossing Hidden

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information