Hidden Markov Model and Speech Recognition

Similar documents
Hidden Markov Models Hamid R. Rabiee

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

Hidden Markov Modelling

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Lecture 11: Hidden Markov Models

Lecture 3: ASR: HMMs, Forward, Viterbi

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

Multiscale Systems Engineering Research Group

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models. Dr. Naomi Harte

Dept. of Linguistics, Indiana University Fall 2009

Lecture 5: GMM Acoustic Modeling and Feature Extraction

ASR using Hidden Markov Model : A tutorial

A New OCR System Similar to ASR System

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

CS 188: Artificial Intelligence Fall 2011

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II)

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

CSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models

Machine Learning for natural language processing

Connected-Word Speech

Statistical Machine Learning from Data

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Computational Genomics and Molecular Biology, Fall

1. Markov models. 1.1 Markov-chain

Hidden Markov Models

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong

Outline of Today s Lecture

Hidden Markov Models (HMMs)

Sparse Models for Speech Recognition

Data Mining in Bioinformatics HMM

CS 7180: Behavioral Modeling and Decision- making in AI

ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models

Statistical NLP Spring The Noisy Channel Model

CS 4495 Computer Vision

Parametric Models Part III: Hidden Markov Models

L23: hidden Markov models

Temporal Modeling and Basic Speech Recognition

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Lecture 9: Speech Recognition. Recognizing Speech

Lecture 9: Speech Recognition

Hidden Markov Models NIKOLAY YAKOVETS

Hidden Markov Models

An Evolutionary Programming Based Algorithm for HMM training

Automatic Speech Recognition (CS753)

Statistical NLP Spring Digitizing Speech

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...

Brief Introduction of Machine Learning Techniques for Content Analysis

Statistical Sequence Recognition and Training: An Introduction to HMMs

The Comparison of Vector Quantization Algoritms in Fish Species Acoustic Voice Recognition Using Hidden Markov Model

Ngram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer

Hidden Markov Models

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme

Hidden Markov Models

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

p(d θ ) l(θ ) 1.2 x x x

HMM part 1. Dr Philip Jackson

Today s Lecture: HMMs

Speech Recognition HMM

Statistical Processing of Natural Language

Introduction to Markov systems

Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Statistical Problem. . We may have an underlying evolving system. (new state) = f(old state, noise) Input data: series of observations X 1, X 2 X t

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Estimation of Cepstral Coefficients for Robust Speech Recognition

Markov processes on curves for automatic speech recognition

Sequence Labeling: HMMs & Structured Perceptron

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Introduction to Machine Learning CMU-10701

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius

A gentle introduction to Hidden Markov Models

Lecture 12: Algorithms for HMMs

CS838-1 Advanced NLP: Hidden Markov Models

Statistical Methods for NLP

Jorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function

Lab 3: Practical Hidden Markov Models (HMM)

Learning Dynamic Audio/Visual Mapping with Input-Output Hidden Markov Models

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

Text Mining. March 3, March 3, / 49

How does a dictation machine recognize speech?

Hidden Markov Models

Hydro-Geological Flow Analysis Using Hidden Markov Models

Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Automatic Speech Recognition (CS753)

Deep Learning for Automatic Speech Recognition Part I

Transcription:

1 Dec,2006

Outline Introduction 1 Introduction 2 3 4 5

Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed challenging Due to presence of noise in input data Variation in voice data due to speaker s physical condition, mood etc.. Difficult to identify boundary condition

Different types of Speech Recognition Type of Speaker Speaker Dependent(SD) relatively easy to construct requires less training data (only from particular speaker) also known as speaker recognition Speaker Independent(SID) requires huge training data (from various speaker) difficult to construct Type of Data Isolates Word Recognizer recognize single word easy to construct (pointer for more difficult speech recognition) may be speaker dependent or speaker independent Continuous Speech Recognition most difficult of all problem of finding word boundary

Outline Introduction 1 Introduction 2 3 4 5

Use of Signal Model it helps us to characterize the property of the given signal provide theoretical basis for signal processing system way to understand how system works we can simulate the source and it help us to understand as much as possible about signal source Why Hidden Markov Model (HMM)? very rich in mathematical structure when applied properly, work very well in practical application

Outline Introduction 1 Introduction 2 3 4 5

Components of HMM [2] 1 Number of state = N 2 Number of distinct observation symbol per state = M, V = V 1, V 2,, V M 3 State transition probability =a ij 4 Observation symbol probability distribution in state j,b j (K) = P[V k at t q t = S j ] 5 The initial state distribution π i = P[q 1 = S i ] 1 i N

Problem For HMM : Problem 1 [2] Problem 1 : Evaluation Problem Given the observation sequence O = O 1 O 2 O T, and model λ = (A, B, π), how do we efficiently compute P(O λ), the probability of observation sequence given the mode. Figure: Evaluation Problem

Problem 2 and 3 [2] Problem 2 : Hidden State Determination (Decoding) Given the observation sequence O = O 1 O 2 O T, and model λ = (A, B, π), How do we choose BEST state sequence Q = q 1 q 2 q T which is optimal in some meaningful sense. (In Speech Recognition it can be considered as state emitting correct phoneme) Problem 3 : Learning How do we adjust the model parameter λ = (A, B, π) to maximize P(O λ). Problem 3 is one in which we try to optimize model parameter so as to best describe as to how given observation sequence comes out

Solution For Problem 1 : Forward Algorithm P(O λ) = q 1,,q T π q1 b q1 (O 1 )a q1 q 2 b q2 (O 2 ) a qt 1 q T b qt (O T ) Which is O(N T ) algorithm i.e. at every state we have N choices to make and total length is T. Forward algorithm which uses dynamic programming method to reduce time complexity. It uses forward variable α t (i) defined as α t (i) = P(O 1, O 2,, O i, q t = S i λ) i.e., the probability of partial observation sequence, O 1, O 2 till O t and state S i at time t given the model λ,

Figure: Forward Variable [ N ] α t+1 (j) = α t (i)a ij b j (O t+1 ), 1 t T 1, 1 j N i=1

Solution For Problem 2 : Decoding using Viterbi Algorithm [1] Viterbi Algorithm : To find single best state sequence we define a quantity δ t (i) = max P[q 1 q 2 q t = i, O 1 O 2 O t λ] q 1,q 2,,q t 1 i.e., δ t (i) is the best score along a single path, at time t, which account for the first t observations and ends in state S i, by induction [ ] δ t+1 (j) = max δ t (i)a ij b j (O t+1 ) i Key point is, Viterbi algorithm is similar (except for the backtracking part) in implementation to the Forward algorithm. The major difference is maximization of the previous state in place of summing procedure in forward calculation

Solution For Problem 3 : Learning (Adjusting model parameter) Uses Baum-Welch Learning Algorithm Core operation is ξ t (i, j) = P(q t = S i, q t+1 = S j O, λ) i.e.,the probability of being in state S i at time t, and state S j at time t + 1 given the model and observation sequence γ t (i) = the probability of being in state S i at time t, given the observation sequence and model we can relate : N γ t (i) = ξ t (i, j) j=1 re-estimated parameters are : π = Expected number of times in state S i = γ 1 (i)

ā ij = expected number of transition from state S i to S j expected number of transition form state S i = T 1 t=1 T 1 ξ t (i, j) γ t (i) t=1 b j (k) = number of times in state j and observing symbol v k expected number of times in state j = T γ t (j) t=1 s.t Ot =V k T γ t (j) t=1

Outline Introduction 1 Introduction 2 3 4 5

Block Diagram of ASR using HMM

Basic Structure Phoneme smallest unit of information in speech signal (over 10 msec) is Phoneme ONE : W AH N English language has approximately 56 phoneme HMM structure for a Phoneme This model is First Order Markov Model Transition is from previous state to next state (no jumping)

Question to be ask? What represent state in HMM? HMM for each phoneme 3 state for each HMM states are : start mid and end ONE : has 3 HMM for phoneme W AH and N each HMM has 3 state What is output symbol? Symbol form Vector Quantization is used as output symbol from state concatenation of symbol gives phoneme

Front-End Introduction purpose is to parameterize an input signal (e.g., audio) into a sequence of Features vector Method for Feature Vector extraction are MFCC - Mel Frequency Cepsteral Coefficient LPC Analysis - Linear Predictive Coding

Acoustic Modeling[1] Uses Vector Quantization to map Feature vector to Symbol. create training set of feature vector cluster them in to small number of classes represent each class by symbol for each class V k, compute the probability that it is generated by given HMM state.

Creation Of Search Graph [3] Search Graph represent Vocabulary under consideration Acoustic Model, Language model and Lexicon (Decoder during recognition) works together to produce Search Graph Language model represent how word are related to each other (which word follows other) it uses Bi-Gram model Lexicon is a file containing WORD PHONEME pair So we have whole vocabulary represented as graph

Complete Example

Training Introduction Training is used to adjust model parameter to maximize the probability of recognition Audio data from various different source are taken it is given to the prototype HMM HMM will adjust the parameter using Baum-Welch algorithm Once the model is train, unknown data is given for recognition

Decoding Introduction It uses Viterbi algorithm for finding BEST state sequence

Decoding Continued This is just for Single Word During Decoding whole graph is searched. Each HMM has two non emitting state for connecting it to other HMM

Outline Introduction 1 Introduction 2 3 4 5

[4]

Problem With Continuous Speech Recognition Boundary condition Large vocabulary Training time Efficient Search Graph creation

Dan Jurafsky. CS 224S / LINGUIST 181 Speech Recognition and Synthesis. World Wide Web, http://www.stanford.edu/class/cs224s/. Lawrence R. Rabiner. A Tutorial on Hidden Markov Model and Selected Applicaiton in Speech Recognition. IEEE, 1989. Willie Walker, Paul Lamere, and Philip Kwok. Sphinx-4: A Flexible Open Source Framework for Speech Recognition. SUN Microsystem, 2004. Steve Young and Gunnar Evermannl. The HTK Book. Microsoft Corporation, 2005.