Lecture 9: Speech Recognition. Recognizing Speech

Size: px
Start display at page:

Download "Lecture 9: Speech Recognition. Recognizing Speech"

Transcription

1 EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis <dpwe@ee.columbia.edu> Columbia University Dept. of Electrical Engineering Spring 6 E68 SAPR - Dan Ellis L9 - Speech Recognition Recognizing Speech So, I thought about that and I think it s still possible 4 Frequency... 3 Time What kind of information might we want from the speech signal? - words - phrasing, speech acts (prosody - mood / emotion - speaker identity What kind of processing do we need to get at that information? - time scale of feature extraction - signal aspects to capture in features - signal aspects to exclude from features E68 SAPR - Dan Ellis L9 - Speech Recognition

2 Speech recognition as Transcription Transcription = speech to text - find a word string to match the utterance Best suited to small vocabulary tasks - voice dialing, command & control etc. Gives neat objective measure: word error rate (WER % - can be a sensitive measure of performance Three kinds of errors: Reference: THE Recognized: - WER = (S + D + I / N CAT SAT ON THE MAT CAT SAT AN THE A MAT Deletion Substitution Insertion E68 SAPR - Dan Ellis L9 - Speech Recognition Problems: Within-speaker variability 4 Timing variation: - word duration varies enormously Frequency s ow ay aa ax b aw axay ih k t s t ih l p aa s b ax l th dx th n th n ih SO I ABOUT I IT'S STILL POSSIBLE THOUGHT THAT THINK AND - fast speech reduces vowels Speaking style variation: - careful/casual articulation - soft/loud speech Contextual effects: - speech sounds vary with context, role: How do you do? E68 SAPR - Dan Ellis L9 - Speech Recognition

3 Between-speaker variability Accent variation - regional / mother tongue Voice quality variation - gender, age, huskiness, nasality Individual characteristics - mannerisms, speed, prosody mbma fjdm freq / Hz time / s E68 SAPR - Dan Ellis L9 - Speech Recognition Environment variability Background noise - fans, cars, doors, papers Reverberation - boxiness in recordings Microphone/channel - huge effect on relative spectral gain Close mic freq / Hz 4 Tabletop mic time / s E68 SAPR - Dan Ellis L9 - Speech Recognition

4 freq / Hz 4 3 How to recognize speech? Cross correlate templates? - waveform? - spectrogram? - time-warp problems Match short-segments & handle time-warp later - model with slices of ~ ms - pseudo-stationary model of words: sil g w eh n sil other sources of variation... time / s E68 SAPR - Dan Ellis L9 - Speech Recognition Probabilistic formulation Probability that segment label is correct - gives standard form of speech recognizers: Feature calculation transforms signal into easilyclassified domain Acoustic classifier calculates probabilities of each mutually-exclusive state q i sn [ ] X m m = --- n H pq ( i X Finite state acceptor (i.e. HMM Qˆ = pq (, q, q L X, X X L { q argmax, q, q L } MAP match of allowable sequence to probabilities: X q = ay q time E68 SAPR - Dan Ellis L9 - Speech Recognition

5 Standard speech recognizer structure Feature calculation sound s[n] Network weights Word models Language model Acoustic classifier feature vectors X m phone probabilities p(q i X m HMM decoder phone & word labeling Questions: - what are the best features? - how do we do the acoustic classification? - how do we find/match the state sequence? E68 SAPR - Dan Ellis L9 - Speech Recognition Outline 3 4 Recognizing Speech Feature Calculation - Spectrogram, MFCCs & PLP - Improving robustness Sequence Recognition Hidden Markov Models E68 SAPR - Dan Ellis L9 - Speech Recognition

6 Feature Calculation Goal: Find a representational space most suitable for classification - waveform: voluminous, redundant, variable - spectrogram: better, still quite variable -...? Pattern Recognition: Representation is upper bound on performance - maybe we should use the waveform... - or, maybe the representation can do all the work Feature calculation is intimately bound to classifier - pragmatic strengths and weaknesses Features develop by slow evolution - current choices more historical than principled E68 SAPR - Dan Ellis L9 - Speech Recognition Features (: Spectrogram Plain STFT as features e.g. [, ] sn+ mh n X m [ k] = SmHk = [ ] wn [ ] e ( jπkn Ú N freq / Hz Consider examples: Feature vector slice... time / s Similarities between corresponding segments - but still large differences E68 SAPR - Dan Ellis L9 - Speech Recognition

7 Male spectrum cepstrum Female spectrum Features (: Cepstrum Idea: Decorrelate, summarize spectral slices: X m [] l = IDFT{ log S[ mh, k] } - good for Gaussian models - greatly reduce feature dimension cepstrum time / s E68 SAPR - Dan Ellis L9 - Speech Recognition Features (3: Frequency axis warp Linear frequency axis gives equal space to - khz and 3-4 khz - but perceptual importance very different Male spectrum Warp frequency axis closer to perceptual axis: - mel, Bark, constant-q u c X[ c] = Sk [ ] k = l c audspec Female audspec... time / s E68 SAPR - Dan Ellis L9 - Speech Recognition

8 level / db Male audspec plp smoothed Features (4: Spectral smoothing Generalizing across different speakers is helped by smoothing (i.e. blurring spectrum Truncated cepstrum is one way: - MSE approx to log Sk [ ] LPC modeling is a little different: - MSE approx to prefers detail at peaks 4 Sk [ ] freq / chan audspec... time / s E68 SAPR - Dan Ellis L9 - Speech Recognition plp Features (: Normalization along time Idea: feature variations, not absolute level Hence: calculate average level & subtract it: Y[ n, k] = X[ n, k] mean n { X[ n, k] } Male plp mean norm Factors out fixed channel frequency response: log Snk [, ] sn [ ] = h c en [ ] = log * H c [ k] + log Enk [, ] Female mean norm... time / s E68 SAPR - Dan Ellis L9 - Speech Recognition

9 Male ddeltas deltas plp (µ,σ norm Delta features Want each segment to have static feature vals - but some segments intrinsically dynamic! calculate their derivatives - maybe steadier? Append dx/dt (+ d X/dt to feature vectors... time / s Relates to onset sensitivity in humans? E68 SAPR - Dan Ellis L9 - Speech Recognition Overall feature calculation MFCCs and/or RASTA-PLP Sound FFT X[k] FFT X[k] cepstra Mel scale freq. warp log X[k] IFFT Truncate Subtract mean spectra audspec Bark scale freq. warp log X[k] Rasta band-pass LPC smooth Cepstral recursion smoothed onsets LPC spectra Key attributes: - spectral, auditory scale - decorrelation - smoothed (spectral detail - normalization of levels CMN MFCC features Rasta-PLP cepstral features E68 SAPR - Dan Ellis L9 - Speech Recognition

10 spectrum 8 4 Features summary Male Female audspec rasta deltas... time / s Normalize same phones Contrast different phones E68 SAPR - Dan Ellis L9 - Speech Recognition Outline 3 4 Recognizing Speech Feature Calculation Sequence Recogntion - Dynamic Time Warp - Probabilistic Formulation Hidden Markov Models E68 SAPR - Dan Ellis L9 - Speech Recognition

11 3 Sequence recognition: Dynamic Time Warp (DTW Framewise comparison with stored templates: Reference ONE TWO THREE FOUR FIVE time /frames Test - distance metric? - comparison across templates? E68 SAPR - Dan Ellis L9 - Speech Recognition Dynamic Time Warp ( Find lowest-cost constrained path: - matrix d(i,j of distances between input frame f i and reference frame r j - allowable predecessors & transition costs T xy Reference frames r j D(i-,j D(i-,j T Lowest cost to (i,j D(i-,j + T D(i,j = d(i,j + min{ } D(i,j- + T D(i-,j- + T Local match cost Best predecessor D(i-,j (including transition cost T T Input frames f i Best path via traceback from final state - store predecessors for each (i,j E68 SAPR - Dan Ellis L9 - Speech Recognition

12 DTW-based recognition Reference templates for each possible word For isolated words: - mark endpoints of input word - calculate scores through each template (+prune Reference ONE TWO THREE FOUR Input frames - continuous speech: link together word ends Successfully handles timing variation - recognize speech at reasonable cost E68 SAPR - Dan Ellis L9 - Speech Recognition Statistical sequence recognition DTW limited because it s hard to optimize - interpretation of distance, transition costs? Need a theoretical foundation: Probability Formulate recognition as MAP choice among models: M * = pm ( j X, Θ argmax M j - X = observed features - M j = word-sequence models - Θ = all current parameters E68 SAPR - Dan Ellis L9 - Speech Recognition

13 Statistical formulation ( Can rearrange via Bayes rule (& drop p(x : M * = pm ( j X, Θ = argmax M j argmax M j pxm ( j, Θ A pm ( j Θ L - p(x M j = likelihood of observations under model - p(m j = prior probability of model - Θ A = acoustics-related model parameters - Θ L = language-related model parameters Questions: - what form of model to use for? - how to find Θ A (training? - how to solve for M j (decoding? pxm ( j, Θ A E68 SAPR - Dan Ellis L9 - Speech Recognition Model M j State-based modeling Assume discrete-state model for the speech: - observations are divided up into time frames - model states observations: Q k : q q q 3 q 4 q q 6... N X : x x 3 x 4 x x 6... x time states observed feature vectors Probability of observations given model is: N pxm ( j = px ( Qk, M j pq ( k M j all Q k - sum over all possible state sequences Q k How do observations depend on states? How do state sequences depend on model? E68 SAPR - Dan Ellis L9 - Speech Recognition

14 Outline 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models (HMM - generative Markov models - hidden Markov models - model fit likelihood - HMM examples E68 SAPR - Dan Ellis L9 - Speech Recognition Markov models S A (first order Markov model is a finite-state system whose behavior depends only on the current state E.g. generative Markov model:.8.8 q n+. p(q n+ q n S A B C E A. B.. C.7... E q n S A B C E S A A A A A A A A B B B B B B B B B C C C C B B B B B B C E E68 SAPR - Dan Ellis L9 - Speech Recognition

15 p(x q p(x q Hidden Markov models = Markov model where state sequence Q = {q n } is not directly observable (= hidden But, observations X do depend on Q: - x n is rv that depends on current state: Emission distributions q = A q = B q = C q = A q = B q = C 3 4 observation x x n x n - can still tell something about state seq... E68 SAPR - Dan Ellis L9 - Speech Recognition State sequence Observation sequence pxq ( AAAAAAAABBBBBBBBBBBCCCCBBBBBBBC 3 time step n (Generative Markov models ( HMM is specified by: - states q i k a t - transition probabilities a ij pq n j i q n ( a ij k a t k a t k a t emission distributions b i (x pxq ( i b i ( x p(x q k a t x pq i ( π i + (initial state probabilities E68 SAPR - Dan Ellis L9 - Speech Recognition

16 Markov models for sequence recognition Independence of observations: - observation x n depends only current state q n pxq ( = px (, x, x N q, q, q N = px ( q px ( q px ( N q N N n = = px ( n q n = b qn ( x n Markov transitions: n = - transition to next state q i+ depends only on q i pqm ( = pq (, q, q N M = = pq ( N q q N pq ( N q q N pq ( q pq ( pq ( q pq ( q pq ( q pq ( N N N N = N pq ( pq ( n q n π N = a n = q n = qn q n E68 SAPR - Dan Ellis L9 - Speech Recognition N Model-fit calculation From state-based modeling : N pxm ( j = px ( Qk, M j pq ( k M j For HMMs: all Q k pxq ( = b qn ( x n Hence, solve for M * : - calculate for each available model, scale by prior n = p ( QM = π a q n = qn Sum over all Q k??? N pxm ( j pm j N q n ( pm j X ( E68 SAPR - Dan Ellis L9 - Speech Recognition

17 S S A B E.9. Model M.7. A B E..8 S A B E q q q q 3 q 4 S A A A E S A A B E S A B B E S B B B E. Summing over all paths x n p(x B p(x A States E B A Observations x, x, x 3 Observation likelihoods p(x q x x x 3 A... q{ B...3 S 3 4 time n All possible 3-emission paths Q k from S to E p(q M = Πn p(q n q n- p(x Q,M = Πn p(x n q n p(x,q M.9 x.7 x.7 x. =.44.9 x.7 x. x. =..9 x. x.8 x. =.88. x.8 x.8 x. =.8. x. x. =.. x. x.3 =.. x. x.3 =.6. x. x.3 = Σ =.9 Σ = p(x M =.4 E68 SAPR - Dan Ellis L9 - Speech Recognition Paths n (length 3 paths only The forward recursion Dynamic-programming-like technique to calculate sum over all Q k α n ( i Define as the probability of getting to state q i at time step n (by any path: α n ( i px (, x, x n, q n =q i n i = px (, qn Model states q i Then α n + ( j can be calculated recursively: α n (i+ a i+j a ij b j (x n+ S α n+ (j = [Σ α n (i a ij ] b j (x n+ i= α n (i Time steps n E68 SAPR - Dan Ellis L9 - Speech Recognition

18 Initialize Forward recursion ( α ( i = π i b i ( x Then total probability N p ( X M = αn ( i Practical way to solve for p(x M j and hence perform recognition S i = Observations X p(x M p(m p(x M p(m Choose best E68 SAPR - Dan Ellis L9 - Speech Recognition Optimal path May be interested in actual q n assignments - which state was active at each time frame - e.g. phone labelling (for training? Total probability is over all paths but can also solve for single best path = Viterbi state sequence j q n Probability along best path to state + : * * α n + ( j = max α n (aij i b j x n + i - backtrack from final state to get best path - final probability is product only (no sum log-domain calculation is just summation Total probability often dominated by best path: pxq (, * M pxm ( ( E68 SAPR - Dan Ellis L9 - Speech Recognition

19 Interpreting the Viterbi path Viterbi path assigns each x n to a state q i - performing classification based on b i (x -... at the same time as applying transition constraints a ij Inferred classification 3 x n Viterbi labels: 3 AAAAAAAABBBBBBBBBBBCCCCBBBBBBBC Can be used for segmentation - train an HMM with garbage and target states - decode on new data to find targets, boundaries Can use for (heuristic training - e.g. train classifiers based on labels... E68 SAPR - Dan Ellis L9 - Speech Recognition Recognition with HMMs Isolated word - choose best pmx ( pxm ( pm ( Model M w ah n p(x M p(m =... Input Model M t uw p(x M p(m =... Model M 3 th r iy p(x M 3 p(m 3 =... Continuous speech - Viterbi decoding of one large HMM gives words Input p(m sil p(m 3 p(m w ah n t uw th r iy E68 SAPR - Dan Ellis L9 - Speech Recognition

20 HMM examples: Different state sequences Model M S K. A.. T E Model M Emission distributions S p(x q K. O.. T E p(x O p(x K p(x T p(x A. 3 4 x E68 SAPR - Dan Ellis L9 - Speech Recognition Observation sequence x n Model M log p(x M = -3. state alignment log p(x,q* M = -33. log trans.prob log p(q* M = -7. log obs.l'hood log p(x Q*,M = -6. Model matching: Emission probabilities 4 A T K O time n / steps Model M log p(x M = -47. state alignment log p(x,q* M = -47. log trans.prob log p(q* M = -8.3 log obs.l'hood log p(x Q*,M = E68 SAPR - Dan Ellis L9 - Speech Recognition

21 Model matching: Transition probabilities Model M' S.8.8 K..9 A.9 O...8 T. E Model M' S.8. K..9 A.9 O...8 T. E state alignment 4 Model M' Model M' log obs.l'hood log p(x Q*,M = -6. log p(x M = -3. log p(x,q* M = log trans.prob log p(q* M = -7.6 log p(x M = -33. log p(x,q* M = log trans.prob log p(q* M = time n / steps E68 SAPR - Dan Ellis L9 - Speech Recognition Summary Speech signal is highly variable - need models that absorb variability - hide what we can with robust features Speech is modeled as a sequence of features - need temporal aspect to recognition - best time-alignment of templates = DTW Hidden Markov models are rigorous solution - self-loops allow temporal dilation - exact, efficient likelihood calculations Parting thought: How to set the HMM parameters? (training E68 SAPR - Dan Ellis L9 - Speech Recognition

Lecture 9: Speech Recognition

Lecture 9: Speech Recognition EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Statistical NLP Spring The Noisy Channel Model

Statistical NLP Spring The Noisy Channel Model Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions

More information

Machine Recognition of Sounds in Mixtures

Machine Recognition of Sounds in Mixtures Machine Recognition of Sounds in Mixtures Outline 1 2 3 4 Computational Auditory Scene Analysis Speech Recognition as Source Formation Sound Fragment Decoding Results & Conclusions Dan Ellis

More information

Lecture 3: Machine learning, classification, and generative models

Lecture 3: Machine learning, classification, and generative models EE E6820: Speech & Audio Processing & Recognition Lecture 3: Machine learning, classification, and generative models 1 Classification 2 Generative models 3 Gaussian models Michael Mandel

More information

Statistical NLP Spring Digitizing Speech

Statistical NLP Spring Digitizing Speech Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon

More information

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ... Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield

More information

Temporal Modeling and Basic Speech Recognition

Temporal Modeling and Basic Speech Recognition UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing

More information

Feature extraction 2

Feature extraction 2 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Estimation of Cepstral Coefficients for Robust Speech Recognition

Estimation of Cepstral Coefficients for Robust Speech Recognition Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details

More information

Lecture 5: GMM Acoustic Modeling and Feature Extraction

Lecture 5: GMM Acoustic Modeling and Feature Extraction CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic

More information

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given

More information

Pattern Recognition Applied to Music Signals

Pattern Recognition Applied to Music Signals JHU CLSP Summer School Pattern Recognition Applied to Music Signals 2 3 4 5 Music Content Analysis Classification and Features Statistical Pattern Recognition Gaussian Mixtures and Neural Nets Singing

More information

Session 1: Pattern Recognition

Session 1: Pattern Recognition Proc. Digital del Continguts Musicals Session 1: Pattern Recognition 1 2 3 4 5 Music Content Analysis Pattern Classification The Statistical Approach Distribution Models Singing Detection Dan Ellis

More information

Design and Implementation of Speech Recognition Systems

Design and Implementation of Speech Recognition Systems Design and Implementation of Speech Recognition Systems Spring 2013 Class 7: Templates to HMMs 13 Feb 2013 1 Recap Thus far, we have looked at dynamic programming for string matching, And derived DTW from

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short

More information

Hidden Markov Model and Speech Recognition

Hidden Markov Model and Speech Recognition 1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed

More information

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.

More information

Monaural speech separation using source-adapted models

Monaural speech separation using source-adapted models Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications

More information

Speech Signal Representations

Speech Signal Representations Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6

More information

On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System

On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System Fabrice Lefèvre, Claude Montacié and Marie-José Caraty Laboratoire d'informatique de Paris VI 4, place Jussieu 755 PARIS

More information

Lecture 3: Pattern Classification. Pattern classification

Lecture 3: Pattern Classification. Pattern classification EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Dr Philip Jackson Centre for Vision, Speech & Signal Processing University of Surrey, UK 1 3 2 http://www.ee.surrey.ac.uk/personal/p.jackson/isspr/ Outline 1. Recognizing patterns

More information

Deep Learning for Speech Recognition. Hung-yi Lee

Deep Learning for Speech Recognition. Hung-yi Lee Deep Learning for Speech Recognition Hung-yi Lee Outline Conventional Speech Recognition How to use Deep Learning in acoustic modeling? Why Deep Learning? Speaker Adaptation Multi-task Deep Learning New

More information

Lecture 3: ASR: HMMs, Forward, Viterbi

Lecture 3: ASR: HMMs, Forward, Viterbi Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Particle Filters and Applications of HMMs Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro

More information

Statistical Sequence Recognition and Training: An Introduction to HMMs

Statistical Sequence Recognition and Training: An Introduction to HMMs Statistical Sequence Recognition and Training: An Introduction to HMMs EECS 225D Nikki Mirghafori nikki@icsi.berkeley.edu March 7, 2005 Credit: many of the HMM slides have been borrowed and adapted, with

More information

Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics

Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Phil Woodland: pcw@eng.cam.ac.uk Lent 2013 Engineering Part IIB: Module 4F11 What is Speech Recognition?

More information

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation Keiichi Tokuda Nagoya Institute of Technology Carnegie Mellon University Tamkang University March 13,

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Particle Filters and Applications of HMMs Instructor: Wei Xu Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley.] Recap: Reasoning

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Particle Filters and Applications of HMMs Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials

More information

Hidden Markov Modelling

Hidden Markov Modelling Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Particle Filters and Applications of HMMs Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Lecture Notes Speech Communication 2, SS 2004 Erhard Rank/Franz Pernkopf Signal Processing and Speech Communication Laboratory Graz University of Technology Inffeldgasse 16c, A-8010

More information

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic

More information

ON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION

ON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION Zaragoza Del 8 al 1 de Noviembre de 26 ON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION Ana I. García Moral, Carmen Peláez Moreno EPS-Universidad Carlos III

More information

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION HIDDEN MARKOV MODELS IN SPEECH RECOGNITION Wayne Ward Carnegie Mellon University Pittsburgh, PA 1 Acknowledgements Much of this talk is derived from the paper "An Introduction to Hidden Markov Models",

More information

1. Markov models. 1.1 Markov-chain

1. Markov models. 1.1 Markov-chain 1. Markov models 1.1 Markov-chain Let X be a random variable X = (X 1,..., X t ) taking values in some set S = {s 1,..., s N }. The sequence is Markov chain if it has the following properties: 1. Limited

More information

1. Personal Audio 2. Features 3. Segmentation 4. Clustering 5. Future Work

1. Personal Audio 2. Features 3. Segmentation 4. Clustering 5. Future Work Segmenting and Classifying Long-Duration Recordings of Personal Audio Dan Ellis and Keansub Lee Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY

More information

Exemplar-based voice conversion using non-negative spectrogram deconvolution

Exemplar-based voice conversion using non-negative spectrogram deconvolution Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data Statistical Machine Learning from Data Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne (EPFL),

More information

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Kumari Rambha Ranjan, Kartik Mahto, Dipti Kumari,S.S.Solanki Dept. of Electronics and Communication Birla

More information

Deep Learning for Automatic Speech Recognition Part I

Deep Learning for Automatic Speech Recognition Part I Deep Learning for Automatic Speech Recognition Part I Xiaodong Cui IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Fall, 2018 Outline A brief history of automatic speech recognition Speech

More information

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course) 10. Hidden Markov Models (HMM) for Speech Processing (some slides taken from Glass and Zue course) Definition of an HMM The HMM are powerful statistical methods to characterize the observed samples of

More information

Lecture 7: Feature Extraction

Lecture 7: Feature Extraction Lecture 7: Feature Extraction Kai Yu SpeechLab Department of Computer Science & Engineering Shanghai Jiao Tong University Autumn 2014 Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 1 / 28 Table of

More information

Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2011 1 HMM Lecture Notes Dannie Durand and Rose Hoberman October 11th 1 Hidden Markov Models In the last few lectures, we have focussed on three problems

More information

HMM part 1. Dr Philip Jackson

HMM part 1. Dr Philip Jackson Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. HMM part 1 Dr Philip Jackson Probability fundamentals Markov models State topology diagrams Hidden Markov models -

More information

Design and Implementation of Speech Recognition Systems

Design and Implementation of Speech Recognition Systems Design and Implementation of Speech Recognition Systems Spring 2012 Class 9: Templates to HMMs 20 Feb 2012 1 Recap Thus far, we have looked at dynamic programming for string matching, And derived DTW from

More information

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification

More information

Markov processes on curves for automatic speech recognition

Markov processes on curves for automatic speech recognition Markov processes on curves for automatic speech recognition Lawrence Saul and Mazin Rahim AT&T Labs - Research Shannon Laboratory 180 Park Ave E-171 Florham Park, NJ 07932 {lsaul,rnazin}gresearch.att.com

More information

Augmented Statistical Models for Speech Recognition

Augmented Statistical Models for Speech Recognition Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent

More information

Harmonic Structure Transform for Speaker Recognition

Harmonic Structure Transform for Speaker Recognition Harmonic Structure Transform for Speaker Recognition Kornel Laskowski & Qin Jin Carnegie Mellon University, Pittsburgh PA, USA KTH Speech Music & Hearing, Stockholm, Sweden 29 August, 2011 Laskowski &

More information

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes Hidden Markov Models based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes music recognition deal with variations in - actual sound -

More information

Linear Prediction 1 / 41

Linear Prediction 1 / 41 Linear Prediction 1 / 41 A map of speech signal processing Natural signals Models Artificial signals Inference Speech synthesis Hidden Markov Inference Homomorphic processing Dereverberation, Deconvolution

More information

TinySR. Peter Schmidt-Nielsen. August 27, 2014

TinySR. Peter Schmidt-Nielsen. August 27, 2014 TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),

More information

Hidden Markov Models Hamid R. Rabiee

Hidden Markov Models Hamid R. Rabiee Hidden Markov Models Hamid R. Rabiee 1 Hidden Markov Models (HMMs) In the previous slides, we have seen that in many cases the underlying behavior of nature could be modeled as a Markov process. However

More information

Proc. of NCC 2010, Chennai, India

Proc. of NCC 2010, Chennai, India Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay

More information

MVA Processing of Speech Features. Chia-Ping Chen, Jeff Bilmes

MVA Processing of Speech Features. Chia-Ping Chen, Jeff Bilmes MVA Processing of Speech Features Chia-Ping Chen, Jeff Bilmes {chiaping,bilmes}@ee.washington.edu SSLI Lab Dept of EE, University of Washington Seattle, WA - UW Electrical Engineering UWEE Technical Report

More information

Dynamic Time-Alignment Kernel in Support Vector Machine

Dynamic Time-Alignment Kernel in Support Vector Machine Dynamic Time-Alignment Kernel in Support Vector Machine Hiroshi Shimodaira School of Information Science, Japan Advanced Institute of Science and Technology sim@jaist.ac.jp Mitsuru Nakai School of Information

More information

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius Doctoral Course in Speech Recognition May 2007 Kjell Elenius CHAPTER 12 BASIC SEARCH ALGORITHMS State-based search paradigm Triplet S, O, G S, set of initial states O, set of operators applied on a state

More information

Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring

Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Kornel Laskowski & Qin Jin Carnegie Mellon University Pittsburgh PA, USA 28 June, 2010 Laskowski & Jin ODYSSEY 2010,

More information

Feature extraction 1

Feature extraction 1 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter

More information

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010 Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition

More information

A gentle introduction to Hidden Markov Models

A gentle introduction to Hidden Markov Models A gentle introduction to Hidden Markov Models Mark Johnson Brown University November 2009 1 / 27 Outline What is sequence labeling? Markov models Hidden Markov models Finding the most likely state sequence

More information

Computer Science March, Homework Assignment #3 Due: Thursday, 1 April, 2010 at 12 PM

Computer Science March, Homework Assignment #3 Due: Thursday, 1 April, 2010 at 12 PM Computer Science 401 8 March, 2010 St. George Campus University of Toronto Homework Assignment #3 Due: Thursday, 1 April, 2010 at 12 PM Speech TA: Frank Rudzicz 1 Introduction This assignment introduces

More information

Automatic Phoneme Recognition. Segmental Hidden Markov Models

Automatic Phoneme Recognition. Segmental Hidden Markov Models Automatic Phoneme Recognition with Segmental Hidden Markov Models Areg G. Baghdasaryan Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment

More information

Detection-Based Speech Recognition with Sparse Point Process Models

Detection-Based Speech Recognition with Sparse Point Process Models Detection-Based Speech Recognition with Sparse Point Process Models Aren Jansen Partha Niyogi Human Language Technology Center of Excellence Departments of Computer Science and Statistics ICASSP 2010 Dallas,

More information

Chapter 9 Automatic Speech Recognition DRAFT

Chapter 9 Automatic Speech Recognition DRAFT P R E L I M I N A R Y P R O O F S. Unpublished Work c 2008 by Pearson Education, Inc. To be published by Pearson Prentice Hall, Pearson Education, Inc., Upper Saddle River, New Jersey. All rights reserved.

More information

Hidden Markov Models Part 2: Algorithms

Hidden Markov Models Part 2: Algorithms Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:

More information

Hierarchical Multi-Stream Posterior Based Speech Recognition System

Hierarchical Multi-Stream Posterior Based Speech Recognition System Hierarchical Multi-Stream Posterior Based Speech Recognition System Hamed Ketabdar 1,2, Hervé Bourlard 1,2 and Samy Bengio 1 1 IDIAP Research Institute, Martigny, Switzerland 2 Ecole Polytechnique Fédérale

More information

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,

More information

order is number of previous outputs

order is number of previous outputs Markov Models Lecture : Markov and Hidden Markov Models PSfrag Use past replacements as state. Next output depends on previous output(s): y t = f[y t, y t,...] order is number of previous outputs y t y

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave

More information

Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer

Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer Gábor Gosztolya, András Kocsor Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University

More information

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu

More information

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Discriminative models for speech recognition

Discriminative models for speech recognition Discriminative models for speech recognition Anton Ragni Peterhouse University of Cambridge A thesis submitted for the degree of Doctor of Philosophy 2013 Declaration This dissertation is the result of

More information

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY MengSun,HugoVanhamme Department of Electrical Engineering-ESAT, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Bus

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence

More information

Lecture 7: Pitch and Chord (2) HMM, pitch detection functions. Li Su 2016/03/31

Lecture 7: Pitch and Chord (2) HMM, pitch detection functions. Li Su 2016/03/31 Lecture 7: Pitch and Chord (2) HMM, pitch detection functions Li Su 2016/03/31 Chord progressions Chord progressions are not arbitrary Example 1: I-IV-I-V-I (C-F-C-G-C) Example 2: I-V-VI-III-IV-I-II-V

More information

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II)

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II) Speech and Language Processing Chapter 9 of SLP Automatic Speech Recognition (II) Outline for ASR ASR Architecture The Noisy Channel Model Five easy pieces of an ASR system 1) Language Model 2) Lexicon/Pronunciation

More information

Robust Speech Recognition in the Presence of Additive Noise. Svein Gunnar Storebakken Pettersen

Robust Speech Recognition in the Presence of Additive Noise. Svein Gunnar Storebakken Pettersen Robust Speech Recognition in the Presence of Additive Noise Svein Gunnar Storebakken Pettersen A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of PHILOSOPHIAE DOCTOR

More information

Lecture 13: Structured Prediction

Lecture 13: Structured Prediction Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models

Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models 8th ISCA Speech Synthesis Workshop August 31 September 2, 2013 Barcelona, Spain Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models Nobukatsu Hojo 1, Kota Yoshizato

More information

Topic 6. Timbre Representations

Topic 6. Timbre Representations Topic 6 Timbre Representations We often say that singer s voice is magnetic the violin sounds bright this French horn sounds solid that drum sounds dull What aspect(s) of sound are these words describing?

More information

Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related

More information

Artificial Intelligence Markov Chains

Artificial Intelligence Markov Chains Artificial Intelligence Markov Chains Stephan Dreiseitl FH Hagenberg Software Engineering & Interactive Media Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS2010

More information

Support Vector Machines using GMM Supervectors for Speaker Verification

Support Vector Machines using GMM Supervectors for Speaker Verification 1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:

More information

SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS

SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS by Jinjin Ye, B.S. A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS for

More information

ASR using Hidden Markov Model : A tutorial

ASR using Hidden Markov Model : A tutorial ASR using Hidden Markov Model : A tutorial Samudravijaya K Workshop on ASR @BAMU; 14-OCT-11 samudravijaya@gmail.com Tata Institute of Fundamental Research Samudravijaya K Workshop on ASR @BAMU; 14-OCT-11

More information

Soundex distance metric

Soundex distance metric Text Algorithms (4AP) Lecture: Time warping and sound Jaak Vilo 008 fall Jaak Vilo MTAT.03.90 Text Algorithms Soundex distance metric Soundex is a coarse phonetic indexing scheme, widely used in genealogy.

More information

Model-based Approaches to Robust Speech Recognition in Diverse Environments

Model-based Approaches to Robust Speech Recognition in Diverse Environments Model-based Approaches to Robust Speech Recognition in Diverse Environments Yongqiang Wang Darwin College Engineering Department Cambridge University October 2015 This dissertation is submitted to the

More information