Lecture 9: Speech Recognition. Recognizing Speech
|
|
- Adelia Patrick
- 5 years ago
- Views:
Transcription
1 EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis <dpwe@ee.columbia.edu> Columbia University Dept. of Electrical Engineering Spring 6 E68 SAPR - Dan Ellis L9 - Speech Recognition Recognizing Speech So, I thought about that and I think it s still possible 4 Frequency... 3 Time What kind of information might we want from the speech signal? - words - phrasing, speech acts (prosody - mood / emotion - speaker identity What kind of processing do we need to get at that information? - time scale of feature extraction - signal aspects to capture in features - signal aspects to exclude from features E68 SAPR - Dan Ellis L9 - Speech Recognition
2 Speech recognition as Transcription Transcription = speech to text - find a word string to match the utterance Best suited to small vocabulary tasks - voice dialing, command & control etc. Gives neat objective measure: word error rate (WER % - can be a sensitive measure of performance Three kinds of errors: Reference: THE Recognized: - WER = (S + D + I / N CAT SAT ON THE MAT CAT SAT AN THE A MAT Deletion Substitution Insertion E68 SAPR - Dan Ellis L9 - Speech Recognition Problems: Within-speaker variability 4 Timing variation: - word duration varies enormously Frequency s ow ay aa ax b aw axay ih k t s t ih l p aa s b ax l th dx th n th n ih SO I ABOUT I IT'S STILL POSSIBLE THOUGHT THAT THINK AND - fast speech reduces vowels Speaking style variation: - careful/casual articulation - soft/loud speech Contextual effects: - speech sounds vary with context, role: How do you do? E68 SAPR - Dan Ellis L9 - Speech Recognition
3 Between-speaker variability Accent variation - regional / mother tongue Voice quality variation - gender, age, huskiness, nasality Individual characteristics - mannerisms, speed, prosody mbma fjdm freq / Hz time / s E68 SAPR - Dan Ellis L9 - Speech Recognition Environment variability Background noise - fans, cars, doors, papers Reverberation - boxiness in recordings Microphone/channel - huge effect on relative spectral gain Close mic freq / Hz 4 Tabletop mic time / s E68 SAPR - Dan Ellis L9 - Speech Recognition
4 freq / Hz 4 3 How to recognize speech? Cross correlate templates? - waveform? - spectrogram? - time-warp problems Match short-segments & handle time-warp later - model with slices of ~ ms - pseudo-stationary model of words: sil g w eh n sil other sources of variation... time / s E68 SAPR - Dan Ellis L9 - Speech Recognition Probabilistic formulation Probability that segment label is correct - gives standard form of speech recognizers: Feature calculation transforms signal into easilyclassified domain Acoustic classifier calculates probabilities of each mutually-exclusive state q i sn [ ] X m m = --- n H pq ( i X Finite state acceptor (i.e. HMM Qˆ = pq (, q, q L X, X X L { q argmax, q, q L } MAP match of allowable sequence to probabilities: X q = ay q time E68 SAPR - Dan Ellis L9 - Speech Recognition
5 Standard speech recognizer structure Feature calculation sound s[n] Network weights Word models Language model Acoustic classifier feature vectors X m phone probabilities p(q i X m HMM decoder phone & word labeling Questions: - what are the best features? - how do we do the acoustic classification? - how do we find/match the state sequence? E68 SAPR - Dan Ellis L9 - Speech Recognition Outline 3 4 Recognizing Speech Feature Calculation - Spectrogram, MFCCs & PLP - Improving robustness Sequence Recognition Hidden Markov Models E68 SAPR - Dan Ellis L9 - Speech Recognition
6 Feature Calculation Goal: Find a representational space most suitable for classification - waveform: voluminous, redundant, variable - spectrogram: better, still quite variable -...? Pattern Recognition: Representation is upper bound on performance - maybe we should use the waveform... - or, maybe the representation can do all the work Feature calculation is intimately bound to classifier - pragmatic strengths and weaknesses Features develop by slow evolution - current choices more historical than principled E68 SAPR - Dan Ellis L9 - Speech Recognition Features (: Spectrogram Plain STFT as features e.g. [, ] sn+ mh n X m [ k] = SmHk = [ ] wn [ ] e ( jπkn Ú N freq / Hz Consider examples: Feature vector slice... time / s Similarities between corresponding segments - but still large differences E68 SAPR - Dan Ellis L9 - Speech Recognition
7 Male spectrum cepstrum Female spectrum Features (: Cepstrum Idea: Decorrelate, summarize spectral slices: X m [] l = IDFT{ log S[ mh, k] } - good for Gaussian models - greatly reduce feature dimension cepstrum time / s E68 SAPR - Dan Ellis L9 - Speech Recognition Features (3: Frequency axis warp Linear frequency axis gives equal space to - khz and 3-4 khz - but perceptual importance very different Male spectrum Warp frequency axis closer to perceptual axis: - mel, Bark, constant-q u c X[ c] = Sk [ ] k = l c audspec Female audspec... time / s E68 SAPR - Dan Ellis L9 - Speech Recognition
8 level / db Male audspec plp smoothed Features (4: Spectral smoothing Generalizing across different speakers is helped by smoothing (i.e. blurring spectrum Truncated cepstrum is one way: - MSE approx to log Sk [ ] LPC modeling is a little different: - MSE approx to prefers detail at peaks 4 Sk [ ] freq / chan audspec... time / s E68 SAPR - Dan Ellis L9 - Speech Recognition plp Features (: Normalization along time Idea: feature variations, not absolute level Hence: calculate average level & subtract it: Y[ n, k] = X[ n, k] mean n { X[ n, k] } Male plp mean norm Factors out fixed channel frequency response: log Snk [, ] sn [ ] = h c en [ ] = log * H c [ k] + log Enk [, ] Female mean norm... time / s E68 SAPR - Dan Ellis L9 - Speech Recognition
9 Male ddeltas deltas plp (µ,σ norm Delta features Want each segment to have static feature vals - but some segments intrinsically dynamic! calculate their derivatives - maybe steadier? Append dx/dt (+ d X/dt to feature vectors... time / s Relates to onset sensitivity in humans? E68 SAPR - Dan Ellis L9 - Speech Recognition Overall feature calculation MFCCs and/or RASTA-PLP Sound FFT X[k] FFT X[k] cepstra Mel scale freq. warp log X[k] IFFT Truncate Subtract mean spectra audspec Bark scale freq. warp log X[k] Rasta band-pass LPC smooth Cepstral recursion smoothed onsets LPC spectra Key attributes: - spectral, auditory scale - decorrelation - smoothed (spectral detail - normalization of levels CMN MFCC features Rasta-PLP cepstral features E68 SAPR - Dan Ellis L9 - Speech Recognition
10 spectrum 8 4 Features summary Male Female audspec rasta deltas... time / s Normalize same phones Contrast different phones E68 SAPR - Dan Ellis L9 - Speech Recognition Outline 3 4 Recognizing Speech Feature Calculation Sequence Recogntion - Dynamic Time Warp - Probabilistic Formulation Hidden Markov Models E68 SAPR - Dan Ellis L9 - Speech Recognition
11 3 Sequence recognition: Dynamic Time Warp (DTW Framewise comparison with stored templates: Reference ONE TWO THREE FOUR FIVE time /frames Test - distance metric? - comparison across templates? E68 SAPR - Dan Ellis L9 - Speech Recognition Dynamic Time Warp ( Find lowest-cost constrained path: - matrix d(i,j of distances between input frame f i and reference frame r j - allowable predecessors & transition costs T xy Reference frames r j D(i-,j D(i-,j T Lowest cost to (i,j D(i-,j + T D(i,j = d(i,j + min{ } D(i,j- + T D(i-,j- + T Local match cost Best predecessor D(i-,j (including transition cost T T Input frames f i Best path via traceback from final state - store predecessors for each (i,j E68 SAPR - Dan Ellis L9 - Speech Recognition
12 DTW-based recognition Reference templates for each possible word For isolated words: - mark endpoints of input word - calculate scores through each template (+prune Reference ONE TWO THREE FOUR Input frames - continuous speech: link together word ends Successfully handles timing variation - recognize speech at reasonable cost E68 SAPR - Dan Ellis L9 - Speech Recognition Statistical sequence recognition DTW limited because it s hard to optimize - interpretation of distance, transition costs? Need a theoretical foundation: Probability Formulate recognition as MAP choice among models: M * = pm ( j X, Θ argmax M j - X = observed features - M j = word-sequence models - Θ = all current parameters E68 SAPR - Dan Ellis L9 - Speech Recognition
13 Statistical formulation ( Can rearrange via Bayes rule (& drop p(x : M * = pm ( j X, Θ = argmax M j argmax M j pxm ( j, Θ A pm ( j Θ L - p(x M j = likelihood of observations under model - p(m j = prior probability of model - Θ A = acoustics-related model parameters - Θ L = language-related model parameters Questions: - what form of model to use for? - how to find Θ A (training? - how to solve for M j (decoding? pxm ( j, Θ A E68 SAPR - Dan Ellis L9 - Speech Recognition Model M j State-based modeling Assume discrete-state model for the speech: - observations are divided up into time frames - model states observations: Q k : q q q 3 q 4 q q 6... N X : x x 3 x 4 x x 6... x time states observed feature vectors Probability of observations given model is: N pxm ( j = px ( Qk, M j pq ( k M j all Q k - sum over all possible state sequences Q k How do observations depend on states? How do state sequences depend on model? E68 SAPR - Dan Ellis L9 - Speech Recognition
14 Outline 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models (HMM - generative Markov models - hidden Markov models - model fit likelihood - HMM examples E68 SAPR - Dan Ellis L9 - Speech Recognition Markov models S A (first order Markov model is a finite-state system whose behavior depends only on the current state E.g. generative Markov model:.8.8 q n+. p(q n+ q n S A B C E A. B.. C.7... E q n S A B C E S A A A A A A A A B B B B B B B B B C C C C B B B B B B C E E68 SAPR - Dan Ellis L9 - Speech Recognition
15 p(x q p(x q Hidden Markov models = Markov model where state sequence Q = {q n } is not directly observable (= hidden But, observations X do depend on Q: - x n is rv that depends on current state: Emission distributions q = A q = B q = C q = A q = B q = C 3 4 observation x x n x n - can still tell something about state seq... E68 SAPR - Dan Ellis L9 - Speech Recognition State sequence Observation sequence pxq ( AAAAAAAABBBBBBBBBBBCCCCBBBBBBBC 3 time step n (Generative Markov models ( HMM is specified by: - states q i k a t - transition probabilities a ij pq n j i q n ( a ij k a t k a t k a t emission distributions b i (x pxq ( i b i ( x p(x q k a t x pq i ( π i + (initial state probabilities E68 SAPR - Dan Ellis L9 - Speech Recognition
16 Markov models for sequence recognition Independence of observations: - observation x n depends only current state q n pxq ( = px (, x, x N q, q, q N = px ( q px ( q px ( N q N N n = = px ( n q n = b qn ( x n Markov transitions: n = - transition to next state q i+ depends only on q i pqm ( = pq (, q, q N M = = pq ( N q q N pq ( N q q N pq ( q pq ( pq ( q pq ( q pq ( q pq ( N N N N = N pq ( pq ( n q n π N = a n = q n = qn q n E68 SAPR - Dan Ellis L9 - Speech Recognition N Model-fit calculation From state-based modeling : N pxm ( j = px ( Qk, M j pq ( k M j For HMMs: all Q k pxq ( = b qn ( x n Hence, solve for M * : - calculate for each available model, scale by prior n = p ( QM = π a q n = qn Sum over all Q k??? N pxm ( j pm j N q n ( pm j X ( E68 SAPR - Dan Ellis L9 - Speech Recognition
17 S S A B E.9. Model M.7. A B E..8 S A B E q q q q 3 q 4 S A A A E S A A B E S A B B E S B B B E. Summing over all paths x n p(x B p(x A States E B A Observations x, x, x 3 Observation likelihoods p(x q x x x 3 A... q{ B...3 S 3 4 time n All possible 3-emission paths Q k from S to E p(q M = Πn p(q n q n- p(x Q,M = Πn p(x n q n p(x,q M.9 x.7 x.7 x. =.44.9 x.7 x. x. =..9 x. x.8 x. =.88. x.8 x.8 x. =.8. x. x. =.. x. x.3 =.. x. x.3 =.6. x. x.3 = Σ =.9 Σ = p(x M =.4 E68 SAPR - Dan Ellis L9 - Speech Recognition Paths n (length 3 paths only The forward recursion Dynamic-programming-like technique to calculate sum over all Q k α n ( i Define as the probability of getting to state q i at time step n (by any path: α n ( i px (, x, x n, q n =q i n i = px (, qn Model states q i Then α n + ( j can be calculated recursively: α n (i+ a i+j a ij b j (x n+ S α n+ (j = [Σ α n (i a ij ] b j (x n+ i= α n (i Time steps n E68 SAPR - Dan Ellis L9 - Speech Recognition
18 Initialize Forward recursion ( α ( i = π i b i ( x Then total probability N p ( X M = αn ( i Practical way to solve for p(x M j and hence perform recognition S i = Observations X p(x M p(m p(x M p(m Choose best E68 SAPR - Dan Ellis L9 - Speech Recognition Optimal path May be interested in actual q n assignments - which state was active at each time frame - e.g. phone labelling (for training? Total probability is over all paths but can also solve for single best path = Viterbi state sequence j q n Probability along best path to state + : * * α n + ( j = max α n (aij i b j x n + i - backtrack from final state to get best path - final probability is product only (no sum log-domain calculation is just summation Total probability often dominated by best path: pxq (, * M pxm ( ( E68 SAPR - Dan Ellis L9 - Speech Recognition
19 Interpreting the Viterbi path Viterbi path assigns each x n to a state q i - performing classification based on b i (x -... at the same time as applying transition constraints a ij Inferred classification 3 x n Viterbi labels: 3 AAAAAAAABBBBBBBBBBBCCCCBBBBBBBC Can be used for segmentation - train an HMM with garbage and target states - decode on new data to find targets, boundaries Can use for (heuristic training - e.g. train classifiers based on labels... E68 SAPR - Dan Ellis L9 - Speech Recognition Recognition with HMMs Isolated word - choose best pmx ( pxm ( pm ( Model M w ah n p(x M p(m =... Input Model M t uw p(x M p(m =... Model M 3 th r iy p(x M 3 p(m 3 =... Continuous speech - Viterbi decoding of one large HMM gives words Input p(m sil p(m 3 p(m w ah n t uw th r iy E68 SAPR - Dan Ellis L9 - Speech Recognition
20 HMM examples: Different state sequences Model M S K. A.. T E Model M Emission distributions S p(x q K. O.. T E p(x O p(x K p(x T p(x A. 3 4 x E68 SAPR - Dan Ellis L9 - Speech Recognition Observation sequence x n Model M log p(x M = -3. state alignment log p(x,q* M = -33. log trans.prob log p(q* M = -7. log obs.l'hood log p(x Q*,M = -6. Model matching: Emission probabilities 4 A T K O time n / steps Model M log p(x M = -47. state alignment log p(x,q* M = -47. log trans.prob log p(q* M = -8.3 log obs.l'hood log p(x Q*,M = E68 SAPR - Dan Ellis L9 - Speech Recognition
21 Model matching: Transition probabilities Model M' S.8.8 K..9 A.9 O...8 T. E Model M' S.8. K..9 A.9 O...8 T. E state alignment 4 Model M' Model M' log obs.l'hood log p(x Q*,M = -6. log p(x M = -3. log p(x,q* M = log trans.prob log p(q* M = -7.6 log p(x M = -33. log p(x,q* M = log trans.prob log p(q* M = time n / steps E68 SAPR - Dan Ellis L9 - Speech Recognition Summary Speech signal is highly variable - need models that absorb variability - hide what we can with robust features Speech is modeled as a sequence of features - need temporal aspect to recognition - best time-alignment of templates = DTW Hidden Markov models are rigorous solution - self-loops allow temporal dilation - exact, efficient likelihood calculations Parting thought: How to set the HMM parameters? (training E68 SAPR - Dan Ellis L9 - Speech Recognition
Lecture 9: Speech Recognition
EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models
Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationStatistical NLP Spring The Noisy Channel Model
Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models
Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions
More informationMachine Recognition of Sounds in Mixtures
Machine Recognition of Sounds in Mixtures Outline 1 2 3 4 Computational Auditory Scene Analysis Speech Recognition as Source Formation Sound Fragment Decoding Results & Conclusions Dan Ellis
More informationLecture 3: Machine learning, classification, and generative models
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Machine learning, classification, and generative models 1 Classification 2 Generative models 3 Gaussian models Michael Mandel
More informationStatistical NLP Spring Digitizing Speech
Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon
More informationDigitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...
Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield
More informationTemporal Modeling and Basic Speech Recognition
UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing
More informationFeature extraction 2
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationEstimation of Cepstral Coefficients for Robust Speech Recognition
Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details
More informationLecture 5: GMM Acoustic Modeling and Feature Extraction
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic
More informationThe Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech
CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given
More informationPattern Recognition Applied to Music Signals
JHU CLSP Summer School Pattern Recognition Applied to Music Signals 2 3 4 5 Music Content Analysis Classification and Features Statistical Pattern Recognition Gaussian Mixtures and Neural Nets Singing
More informationSession 1: Pattern Recognition
Proc. Digital del Continguts Musicals Session 1: Pattern Recognition 1 2 3 4 5 Music Content Analysis Pattern Classification The Statistical Approach Distribution Models Singing Detection Dan Ellis
More informationDesign and Implementation of Speech Recognition Systems
Design and Implementation of Speech Recognition Systems Spring 2013 Class 7: Templates to HMMs 13 Feb 2013 1 Recap Thus far, we have looked at dynamic programming for string matching, And derived DTW from
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short
More informationHidden Markov Model and Speech Recognition
1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed
More informationReformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features
Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.
More informationMonaural speech separation using source-adapted models
Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications
More informationSpeech Signal Representations
Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6
More informationOn the Influence of the Delta Coefficients in a HMM-based Speech Recognition System
On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System Fabrice Lefèvre, Claude Montacié and Marie-José Caraty Laboratoire d'informatique de Paris VI 4, place Jussieu 755 PARIS
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationHidden Markov Models
Hidden Markov Models Dr Philip Jackson Centre for Vision, Speech & Signal Processing University of Surrey, UK 1 3 2 http://www.ee.surrey.ac.uk/personal/p.jackson/isspr/ Outline 1. Recognizing patterns
More informationDeep Learning for Speech Recognition. Hung-yi Lee
Deep Learning for Speech Recognition Hung-yi Lee Outline Conventional Speech Recognition How to use Deep Learning in acoustic modeling? Why Deep Learning? Speaker Adaptation Multi-task Deep Learning New
More informationLecture 3: ASR: HMMs, Forward, Viterbi
Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Particle Filters and Applications of HMMs Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro
More informationStatistical Sequence Recognition and Training: An Introduction to HMMs
Statistical Sequence Recognition and Training: An Introduction to HMMs EECS 225D Nikki Mirghafori nikki@icsi.berkeley.edu March 7, 2005 Credit: many of the HMM slides have been borrowed and adapted, with
More informationEngineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics
Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Phil Woodland: pcw@eng.cam.ac.uk Lent 2013 Engineering Part IIB: Module 4F11 What is Speech Recognition?
More informationMel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda
Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation Keiichi Tokuda Nagoya Institute of Technology Carnegie Mellon University Tamkang University March 13,
More informationCS 5522: Artificial Intelligence II
CS 5522: Artificial Intelligence II Particle Filters and Applications of HMMs Instructor: Wei Xu Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley.] Recap: Reasoning
More informationCS 5522: Artificial Intelligence II
CS 5522: Artificial Intelligence II Particle Filters and Applications of HMMs Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials
More informationHidden Markov Modelling
Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Particle Filters and Applications of HMMs Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro
More informationHidden Markov Models
Hidden Markov Models Lecture Notes Speech Communication 2, SS 2004 Erhard Rank/Franz Pernkopf Signal Processing and Speech Communication Laboratory Graz University of Technology Inffeldgasse 16c, A-8010
More informationNoise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm
EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic
More informationON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION
Zaragoza Del 8 al 1 de Noviembre de 26 ON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION Ana I. García Moral, Carmen Peláez Moreno EPS-Universidad Carlos III
More informationHIDDEN MARKOV MODELS IN SPEECH RECOGNITION
HIDDEN MARKOV MODELS IN SPEECH RECOGNITION Wayne Ward Carnegie Mellon University Pittsburgh, PA 1 Acknowledgements Much of this talk is derived from the paper "An Introduction to Hidden Markov Models",
More information1. Markov models. 1.1 Markov-chain
1. Markov models 1.1 Markov-chain Let X be a random variable X = (X 1,..., X t ) taking values in some set S = {s 1,..., s N }. The sequence is Markov chain if it has the following properties: 1. Limited
More information1. Personal Audio 2. Features 3. Segmentation 4. Clustering 5. Future Work
Segmenting and Classifying Long-Duration Recordings of Personal Audio Dan Ellis and Keansub Lee Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY
More informationExemplar-based voice conversion using non-negative spectrogram deconvolution
Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data Statistical Machine Learning from Data Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne (EPFL),
More informationSinger Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers
Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Kumari Rambha Ranjan, Kartik Mahto, Dipti Kumari,S.S.Solanki Dept. of Electronics and Communication Birla
More informationDeep Learning for Automatic Speech Recognition Part I
Deep Learning for Automatic Speech Recognition Part I Xiaodong Cui IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Fall, 2018 Outline A brief history of automatic speech recognition Speech
More information10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)
10. Hidden Markov Models (HMM) for Speech Processing (some slides taken from Glass and Zue course) Definition of an HMM The HMM are powerful statistical methods to characterize the observed samples of
More informationLecture 7: Feature Extraction
Lecture 7: Feature Extraction Kai Yu SpeechLab Department of Computer Science & Engineering Shanghai Jiao Tong University Autumn 2014 Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 1 / 28 Table of
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2011 1 HMM Lecture Notes Dannie Durand and Rose Hoberman October 11th 1 Hidden Markov Models In the last few lectures, we have focussed on three problems
More informationHMM part 1. Dr Philip Jackson
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. HMM part 1 Dr Philip Jackson Probability fundamentals Markov models State topology diagrams Hidden Markov models -
More informationDesign and Implementation of Speech Recognition Systems
Design and Implementation of Speech Recognition Systems Spring 2012 Class 9: Templates to HMMs 20 Feb 2012 1 Recap Thus far, we have looked at dynamic programming for string matching, And derived DTW from
More informationChapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析
Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification
More informationMarkov processes on curves for automatic speech recognition
Markov processes on curves for automatic speech recognition Lawrence Saul and Mazin Rahim AT&T Labs - Research Shannon Laboratory 180 Park Ave E-171 Florham Park, NJ 07932 {lsaul,rnazin}gresearch.att.com
More informationAugmented Statistical Models for Speech Recognition
Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent
More informationHarmonic Structure Transform for Speaker Recognition
Harmonic Structure Transform for Speaker Recognition Kornel Laskowski & Qin Jin Carnegie Mellon University, Pittsburgh PA, USA KTH Speech Music & Hearing, Stockholm, Sweden 29 August, 2011 Laskowski &
More informationSignal Modeling Techniques in Speech Recognition. Hassan A. Kingravi
Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationHidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes
Hidden Markov Models based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes music recognition deal with variations in - actual sound -
More informationLinear Prediction 1 / 41
Linear Prediction 1 / 41 A map of speech signal processing Natural signals Models Artificial signals Inference Speech synthesis Hidden Markov Inference Homomorphic processing Dereverberation, Deconvolution
More informationTinySR. Peter Schmidt-Nielsen. August 27, 2014
TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),
More informationHidden Markov Models Hamid R. Rabiee
Hidden Markov Models Hamid R. Rabiee 1 Hidden Markov Models (HMMs) In the previous slides, we have seen that in many cases the underlying behavior of nature could be modeled as a Markov process. However
More informationProc. of NCC 2010, Chennai, India
Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay
More informationMVA Processing of Speech Features. Chia-Ping Chen, Jeff Bilmes
MVA Processing of Speech Features Chia-Ping Chen, Jeff Bilmes {chiaping,bilmes}@ee.washington.edu SSLI Lab Dept of EE, University of Washington Seattle, WA - UW Electrical Engineering UWEE Technical Report
More informationDynamic Time-Alignment Kernel in Support Vector Machine
Dynamic Time-Alignment Kernel in Support Vector Machine Hiroshi Shimodaira School of Information Science, Japan Advanced Institute of Science and Technology sim@jaist.ac.jp Mitsuru Nakai School of Information
More informationDoctoral Course in Speech Recognition. May 2007 Kjell Elenius
Doctoral Course in Speech Recognition May 2007 Kjell Elenius CHAPTER 12 BASIC SEARCH ALGORITHMS State-based search paradigm Triplet S, O, G S, set of initial states O, set of operators applied on a state
More informationModeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring
Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Kornel Laskowski & Qin Jin Carnegie Mellon University Pittsburgh PA, USA 28 June, 2010 Laskowski & Jin ODYSSEY 2010,
More informationFeature extraction 1
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter
More informationLecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010
Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition
More informationA gentle introduction to Hidden Markov Models
A gentle introduction to Hidden Markov Models Mark Johnson Brown University November 2009 1 / 27 Outline What is sequence labeling? Markov models Hidden Markov models Finding the most likely state sequence
More informationComputer Science March, Homework Assignment #3 Due: Thursday, 1 April, 2010 at 12 PM
Computer Science 401 8 March, 2010 St. George Campus University of Toronto Homework Assignment #3 Due: Thursday, 1 April, 2010 at 12 PM Speech TA: Frank Rudzicz 1 Introduction This assignment introduces
More informationAutomatic Phoneme Recognition. Segmental Hidden Markov Models
Automatic Phoneme Recognition with Segmental Hidden Markov Models Areg G. Baghdasaryan Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment
More informationDetection-Based Speech Recognition with Sparse Point Process Models
Detection-Based Speech Recognition with Sparse Point Process Models Aren Jansen Partha Niyogi Human Language Technology Center of Excellence Departments of Computer Science and Statistics ICASSP 2010 Dallas,
More informationChapter 9 Automatic Speech Recognition DRAFT
P R E L I M I N A R Y P R O O F S. Unpublished Work c 2008 by Pearson Education, Inc. To be published by Pearson Prentice Hall, Pearson Education, Inc., Upper Saddle River, New Jersey. All rights reserved.
More informationHidden Markov Models Part 2: Algorithms
Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:
More informationHierarchical Multi-Stream Posterior Based Speech Recognition System
Hierarchical Multi-Stream Posterior Based Speech Recognition System Hamed Ketabdar 1,2, Hervé Bourlard 1,2 and Samy Bengio 1 1 IDIAP Research Institute, Martigny, Switzerland 2 Ecole Polytechnique Fédérale
More informationEstimation of Relative Operating Characteristics of Text Independent Speaker Verification
International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,
More informationorder is number of previous outputs
Markov Models Lecture : Markov and Hidden Markov Models PSfrag Use past replacements as state. Next output depends on previous output(s): y t = f[y t, y t,...] order is number of previous outputs y t y
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationSegmental Recurrent Neural Networks for End-to-end Speech Recognition
Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave
More informationImproving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer
Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer Gábor Gosztolya, András Kocsor Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University
More informationPHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS
PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu
More informationHuman-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg
Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationDiscriminative models for speech recognition
Discriminative models for speech recognition Anton Ragni Peterhouse University of Cambridge A thesis submitted for the degree of Doctor of Philosophy 2013 Declaration This dissertation is the result of
More informationA TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme
A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY MengSun,HugoVanhamme Department of Electrical Engineering-ESAT, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Bus
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence
More informationLecture 7: Pitch and Chord (2) HMM, pitch detection functions. Li Su 2016/03/31
Lecture 7: Pitch and Chord (2) HMM, pitch detection functions Li Su 2016/03/31 Chord progressions Chord progressions are not arbitrary Example 1: I-IV-I-V-I (C-F-C-G-C) Example 2: I-V-VI-III-IV-I-II-V
More informationSpeech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II)
Speech and Language Processing Chapter 9 of SLP Automatic Speech Recognition (II) Outline for ASR ASR Architecture The Noisy Channel Model Five easy pieces of an ASR system 1) Language Model 2) Lexicon/Pronunciation
More informationRobust Speech Recognition in the Presence of Additive Noise. Svein Gunnar Storebakken Pettersen
Robust Speech Recognition in the Presence of Additive Noise Svein Gunnar Storebakken Pettersen A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of PHILOSOPHIAE DOCTOR
More informationLecture 13: Structured Prediction
Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationText-to-speech synthesizer based on combination of composite wavelet and hidden Markov models
8th ISCA Speech Synthesis Workshop August 31 September 2, 2013 Barcelona, Spain Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models Nobukatsu Hojo 1, Kota Yoshizato
More informationTopic 6. Timbre Representations
Topic 6 Timbre Representations We often say that singer s voice is magnetic the violin sounds bright this French horn sounds solid that drum sounds dull What aspect(s) of sound are these words describing?
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related
More informationArtificial Intelligence Markov Chains
Artificial Intelligence Markov Chains Stephan Dreiseitl FH Hagenberg Software Engineering & Interactive Media Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS2010
More informationSupport Vector Machines using GMM Supervectors for Speaker Verification
1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:
More informationSPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS
SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS by Jinjin Ye, B.S. A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS for
More informationASR using Hidden Markov Model : A tutorial
ASR using Hidden Markov Model : A tutorial Samudravijaya K Workshop on ASR @BAMU; 14-OCT-11 samudravijaya@gmail.com Tata Institute of Fundamental Research Samudravijaya K Workshop on ASR @BAMU; 14-OCT-11
More informationSoundex distance metric
Text Algorithms (4AP) Lecture: Time warping and sound Jaak Vilo 008 fall Jaak Vilo MTAT.03.90 Text Algorithms Soundex distance metric Soundex is a coarse phonetic indexing scheme, widely used in genealogy.
More informationModel-based Approaches to Robust Speech Recognition in Diverse Environments
Model-based Approaches to Robust Speech Recognition in Diverse Environments Yongqiang Wang Darwin College Engineering Department Cambridge University October 2015 This dissertation is submitted to the
More information