Overview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition

Size: px
Start display at page:

Download "Overview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition"

Transcription

1 Overvew Hdden Marov Models and Gaussan Mxture Models Steve Renals and Peter Bell Automatc Speech Recognton ASR Lectures &5 8/3 January 3 HMMs and GMMs Key models and algorthms for HMM acoustc models Gaussans GMMs: Gaussan mxture models HMMs: Hdden Marov models HMM algorthms Lelhood computaton (forward algorthm Most probable state sequence (Vterb algorthm Estmtng the parameters (EM algorthm ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Fundamental Equaton of Statstcal Speech Recognton Acoustc Modellng If X s the sequence of acoustc feature vectors (observatons and W denotes a word sequence, the most lely word sequence W s gven by Applyng Bayes Theorem: W = arg max P(W X W P(W X = p(x WP(W p(x p(x WP(W W = arg max W p(x W }{{} Acoustc model P(W }{{} Language model Recorded Speech Sgnal Analyss Tranng Data Decoded Text (Transcrpton Hdden Marov Model Acoustc Model Lexcon Language Model Search Space ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models

2 Herarchcal modellng of speech Acoustc Model: Contnuous Densty HMM Generatve Model "No rght" Utterance P(s s P(s s P(s 3 s 3 NO RIGHT Word n oh r a t Subword s s s 3 s E s I P(s s I P(s s P(s 3 s P(s E s 3 HMM p(x s p(x s p(x s 3 Acoustcs x x x Probablstc fnte state automaton Paramaters λ: Transton probabltes: a = P(s s Output probablty densty functon: b (x = p(x s ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 5 Acoustc Model: Contnuous Densty HMM HMM Assumptons ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 P(s s P(s s P(s 3 s 3 s I s s s 3 s E s s s 3 s E s I P(s s I P(s s P(s 3 s P(s E s 3 p(x s p(x s p(x s 3 Paramaters λ: x x x 3 x x 5 x 6 Probablstc fnte state automaton Transton probabltes: a = P(s s Output probablty densty functon: b (x = p(x s x x Observaton ndependence An acoustc observaton x s condtonally ndependent of all other observatons gven the state that generated t Marov process A state s condtonally ndependent of all other states gven the prevous state x ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7

3 HMM Assumptons s(t s(t s(t+ x(t x(t x(t + HMM OUTPUT DISTRIBUTION Observaton ndependence An acoustc observaton x s condtonally ndependent of all other observatons gven the state that generated t Marov process A state s condtonally ndependent of all other states gven the prevous state ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 8 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 9 Output dstrbuton Bacground: cdf P(s s P(s s P(s 3 s 3 s s s 3 s E s I P(s s I P(s s P(s 3 s P(s E s 3 Consder a real valued random varable X Cumulatve dstrbuton functon (cdf F (x for X : p(x s p(x s p(x s 3 F (x = P(X x x x x Sngle multvarate Gaussan wth mean µ, covarance matrx Σ : b (x = p(x s = N (x; µ, Σ M-component Gaussan mxture model: M b (x = p(x s = c m N (x; µ m, Σ m m= To obtan the probablty of fallng n an nterval we can do the followng: P(a < X b = P(X b P(X a = F (b F (a ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models

4 Bacground: pdf The Gaussan dstrbuton (unvarate The rate of change of the cdf gves us the probablty densty functon (pdf, p(x: p(x = d dx F (x = F (x F (x = x p(xdx p(x s not the probablty that X has value x. But the pdf s proportonal to the probablty that X les n a small nterval centred on x. Notaton: p for pdf, P for probablty The Gaussan (or Normal dstrbuton s the most common (and easly analysed contnuous dstrbuton It s also a reasonable model n many stuatons (the famous bell curve If a (scalar varable has a Gaussan dstrbuton, then t has a probablty densty functon wth ths form: ( p(x µ, σ = N(x; µ, σ (x µ = exp πσ σ The Gaussan s descrbed by two parameters: the mean µ (locaton the varance σ (dsperson Plot of Gaussan dstrbuton ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Propertes of the Gaussan dstrbuton ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3 Gaussans have the same shape, wth the locaton controlled by the mean, and the spread controlled by the varance One-dmensonal Gaussan wth zero mean and unt varance (µ =, σ = : N(x; µ, σ =. ( (x µ exp πσ σ pdfs of Gaussan dstrbutons..35 mean= varance= pdf of Gaussan Dstrbuton.35.3 mean= varance= mean= varance= p(x m,s. p(x m,s..5 mean= varance= x ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models x ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 5

5 Parameter estmaton Estmate mean and varance parameters of a Gaussan from data x, x,..., x n Use sample mean and sample varance estmates: µ = n σ = n n = x (sample mean n (x µ (sample varance = Exercse Consder the log lelhood of a set of N data ponts {x,..., x N } beng generated by a Gaussan wth mean µ and varance σ : L = ln p({x,..., x n } µ, σ = = σ ( (xn µ n= σ ln σ ln(π (x n µ N ln σ N ln(π n= By maxmsng the the log lelhood functon wth respect to µ show that the maxmum lelhood estmate for the mean s ndeed the sample mean: µ ML = x n. N n= ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 The multdmensonal Gaussan dstrbuton Covarance matrx ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7 The mean vector µ s the expectaton of x: The d-dmensonal vector x s multvarate Gaussan f t has a probablty densty functon of the followng form: ( p(x µ, Σ = (π d/ exp Σ / (x µt Σ (x µ The pdf s parameterzed by the mean vector µ and the covarance matrx Σ. The -dmensonal Gaussan s a specal case of ths pdf The argument to the exponental.5(x µ T Σ (x µ s referred to as a quadratc form. ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 8 µ = E[x] The covarance matrx Σ s the expectaton of the devaton of x from the mean: Σ = E[(x µ(x µ T ] Σ s a d d symmetrc matrx: Σ = E[(x µ (x µ ] = E[(x µ (x µ ] = Σ The sgn of the covarance helps to determne the relatonshp between two components: If x s large when x s large, then (x µ (x µ wll tend to be postve; If x s small when x s large, then (x µ (x µ wll tend to be negatve. ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 9

6 Sphercal Gaussan Dagonal Covarance Gaussan Surface plot of p(x, x Contour plot of p(x, x Surface plot of p(x, x Contour plot of p(x, x p(x, x x.5.5 p(x, x x x µ =.5.5 ( x.5.5 Σ = x ( ρ =. x µ = ( x Σ = x ( ρ = Full covarance Gaussan ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Parameter estmaton ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models p(x, x x µ = ( Surface plot of p(x, x 3 x Σ = 3 x 3 3 ( Contour plot of p(x, x 3 3 x ρ =.5 It s possble to show that the mean vector ˆµ and covarance matrx ˆΣ that maxmze the lelhood of the tranng data are gven by: ˆµ = N ˆΣ = N n= x n (x n ˆµ(x n ˆµ T n= The mean of the dstrbuton s estmated by the sample mean and the covarance by the sample covarance ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3

7 Example data Maxmum lelhood ft to a Gaussan 5 5 X X X X Data n clusters (example ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Example ft by a Gaussan ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models µ = [ ] T µ = [ ] T Σ = Σ =.I µ = [ ] T µ = [ ] T Σ = Σ =.I ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7

8 -means clusterng -means example: data set (,3 -means s an automatc procedure for clusterng unlabelled data Requres a prespecfed number of clusters Clusterng algorthm chooses a set of clusters wth the mnmum wthn-cluster varance Guaranteed to converge (eventually Clusterng soluton s dependent on the ntalsaton 5 (,9 (7,8 (6,6 (7,6 (,5 (5, (8, (,5 (, (5, (, (3, (, 5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 8 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 9 -means example: ntalzaton -means example: teraton (assgn ponts to clusters (,3 (,3 (,9 (7,8 (,9 (7,8 5 (6,6 (7,6 (,5 (,5 5 (6,6 (7,6 (,5 (,5 (5, (8, (5, (8, (, (5, (, (5, (, (3, (, 5 (, (3, (, 5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3

9 -means example: teraton (recompute centres -means example: teraton (assgn ponts to clusters (,3 (,3 (,9 (.33, (,9 (.33, (7,8 (7,8 5 (6,6 (7,6 (,5 (,5 5 (6,6 (,5 (7,6 (,5 (, (3.57, 3 (5, (5, (8, (8.75,3.75 (, (3.57, 3 (5, (5, (8, (8.75,3.75 (, (3, (, 5 (, (3, (, 5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 33 -means example: teraton (recompute centres -means example: teraton 3 (assgn ponts to clusters (,3 (,3 (,9 (.33, (,9 (.33, (7,8 (7,8 5 (6,6 (7,6 (,5 (5, (8, (,5 (8.,. 5 (,5 (6,6 (5, (7,6 (8, (,5 (8.,. (, (, (3.7,.5 (5, (3, (, 5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3 (, (, No changes, so converged (3.7,.5 (5, (3, (, 5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 35

10 Mxture model Component occupaton probablty A more flexble form of densty estmaton s made up of a lnear combnaton of component denstes: p(x = M p(x P( = Ths s called a mxture model or a mxture densty p(x : component denstes P(: mxng parameters Generatve model: Choose a mxture component based on P( Generate a data pont x from the chosen component usng p(x We can apply Bayes theorem: P( x = p(x P( p(x = p(x P( M = p(x P( The posteror probabltes P( x gve the probablty that component was responsble for generatng data pont x The P( xs are called the component occupaton probabltes (or sometmes called the responsbltes Snce they are posteror probabltes: M P( x = = Parameter estmaton ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 36 If we new whch mxture component was responsble for a data pont: we would be able to assgn each pont unambguously to a mxture component and we could estmate the mean for each component Gaussan as the sample mean (ust le -means clusterng and we could estmate the covarance as the sample covarance But we don t now whch mxture component a data pont comes from... Maybe we could use the component occupaton probabltes P( x? Gaussan mxture model ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 37 The most mportant mxture model s the Gaussan Mxture Model (GMM, where the component denstes are Gaussans Consder a GMM, where each component Gaussan N (x; µ, σ has mean µ and a sphercal covarance Σ = σ I p(x = p(x P P(p(x = = P( p(x P( p(x P P(N (x; µ, σ = P(M p(x M ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 38 x x x d ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 39

11 GMM Parameter estmaton when we now whch component generated the data Defne the ndcator varable z n = f component generated component x n (and otherwse If z n wasn t hdden then we could count the number of observed data ponts generated by : N = EM algorthm n= And estmate the mean, varance and mxng parameters as: Problem! Recall that: ˆµ = ˆσ = ˆP( = N n z nx n N z n n z n x n µ n N z n = N N ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models P( x = p(x P( p(x We need to now p(x and P( to estmate the parameters of p(x and to estmate P(... Soluton: an teratve algorthm where each teraton has two parts: Compute the component occupaton probabltes P( x usng the current estmates of the GMM parameters (means, varances, mxng parameters (E-step Computer the GMM parameters usng the current estmates of the component occupaton probabltes (M-step Startng from some ntalzaton (e.g. usng -means for the means these steps are alternated untl convergence Ths s called the EM Algorthm and can be shown to maxmze the lelhood ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Soft assgnment Estmate soft counts based on the component occupaton probabltes P( x n : N = P( x n n= We can magne assgnng data ponts to component weghted by the component occupaton probablty P( x n So we could magne estmatng the mean, varance and pror probabltes as: n ˆµ = P( xn x n n = P( xn x n n P( xn N n ˆσ = P( xn x n µ n = P( xn x n µ n P( xn ˆP( = N n P( x n = N N ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Maxmum lelhood parameter estmaton The lelhood of a data set X = {x, x,..., x N } s gven by: L = N p(x n = n= N n= = N M p(x n P( We can regard the negatve log lelhood as an error functon: E = ln L = = ln p(x n n= M ln p(x n P( n= = Consderng the dervatves of E wth respect to the parameters, gves expressons le the prevous slde ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3

12 Example ft usng a GMM Pealy dstrbuted data (Example µ = µ = [ ] T Σ =.I Σ = I.5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models.5 Example ft by a Gaussan.5 Example ft by a GMM ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Ftted wth a two component GMM usng EM µ = µ = [ ] T Σ =.I Σ = I 3 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7

13 Example : component Gaussans Comments on GMMs GMMs traned usng the EM algorthm are able to self organze to ft a data set Indvdual components tae responsblty for parts of the data set (probablstcally Soft assgnment to components not hard assgnment soft clusterng GMMs scale very well, e.g.: large speech recognton systems can have 3, GMMs, each wth 3 components: sometmes mllon Gaussan components!! And the parameters all estmated from (a lot of data by EM ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 8 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 9 Bac to HMMs... The three problems of HMMs P(s s P(s s P(s 3 s 3 s s s 3 s E s I P(s s I P(s s P(s 3 s P(s E s 3 x p(x s x p(x s x p(x s 3 Output dstrbuton: Sngle multvarate Gaussan wth mean µ, covarance matrx Σ : b (x = p(x s = N (x; µ, Σ M-component Gaussan mxture model: M b (x = p(x s = c m N (x; µ m, Σ m m= ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 5 Worng wth HMMs requres the soluton of three problems: Lelhood Determne the overall lelhood of an observaton sequence X = (x,..., x t,..., x T beng generated by an HMM Decodng Gven an observaton sequence and an HMM, determne the most probable hdden state sequence 3 Tranng Gven an observaton sequence and an HMM, learn the best HMM parameters λ = {{a }, {b (}} ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 5

14 . Lelhood: The Forward algorthm Goal: determne p(x λ Sum over all possble state sequences s s... s T that could result n the observaton sequence X Rather than enumeratng each sequence, compute the probabltes recursvely (explotng the Marov assumpton Recursve algorthms on HMMs Vsualze the problem as a state-tme trells t- t t+ ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 53. Lelhood: The Forward algorthm. Lelhood: The Forward recurson Goal: determne p(x λ Sum over all possble state sequences s s... s T that could result n the observaton sequence X Rather than enumeratng each sequence, compute the probabltes recursvely (explotng the Marov assumpton Forward probablty, α t (s : the probablty of observng the observaton sequence x... x t and beng n state s at tme t: α t (s = p(x,..., x t, S(t = s λ Intalzaton Recurson Termnaton α (s I = α (s = f s s I α t (s = α t (s a b (x t = p(x λ = α T (s E = α T (s a E = ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 55

15 . Lelhood: Forward Recurson Vterb approxmaton α t (s = p(x,..., x t, S(t = s λ t- α t (s α t (s a α t (s Vterb Recurson a a b (x t t α t (s t+ ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 56 Instead of summng over all possble state sequences, ust consder the most lely Acheve ths by changng the summaton to a maxmsaton n the recurson: V t (s = max V t (s a b (x t Changng the recurson n ths way gves the lelhood of the most probable path We need to eep trac of the states that mae up ths path by eepng a sequence of bacponters to enable a Vterb bactrace: the bacponter for each state at each tme ndcates the prevous state on the most probable path Vterb Recurson ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 57 Lelhood of the most probable path t- t t+ V t (s a b (x t max V t (s Bacponters to the prevous state on the most probable path t- t bt t (s = s t+ b (x t V t (s a a V t (s a V t (s V t (s ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 58 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 59

16 . Decodng: The Vterb algorthm Vterb Bactrace Intalzaton Recurson Termnaton V t (s = V (s I = V (s = bt (s = f s s I N max = V t (s a b (x t bt t (s = arg N max = V t (s a b (x t P = V T (s E = N max = V T (s a E s T = bt T (q E = arg N max = V T (s a E ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 Bactrace to fnd the state sequence of the most probable path t- a t bt t (s = s b (x t V t (s t+ V t (s bt t+ (s = s ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 3. Tranng: Forward-Bacward algorthm Goal: Effcently estmate the parameters of an HMM λ from an observaton sequence Assume sngle Gaussan output probablty dstrbuton Parameters λ: Transton probabltes a : b (x = p(x s = N (x; µ, Σ a = Gaussan parameters for state s : mean vector µ ; covarance matrx Σ ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 Vterb Tranng If we new the state-tme algnment, then each observaton feature vector could be assgned to a specfc state A state-tme algnment can be obtaned usng the most probable path obtaned by Vterb decodng Maxmum lelhood estmate of a, f C(s s s the count of transtons from s to s â = C(s s C(s s Lewse f Z s the set of observed acoustc feature vectors assgned to state, we can use the standard maxmum lelhood estmates for the mean and the covarance: ˆµ x Z = x Z x Z ˆΣ = (x ˆµ (x ˆµ T Z ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 63

17 EM Algorthm Vterb tranng s an approxmaton we would le to consder all possble paths In ths case rather than havng a hard state-tme algnment we estmate a probablty State occupaton probablty: The probablty γ t (s of occupyng state s at tme t gven the sequence of observatons. Compare wth component occupaton probablty n a GMM We can use ths for an teratve algorthm for HMM tranng: the EM algorthm Each teraton has two steps: E-step estmate the state occupaton probabltes (Expectaton M-step re-estmate the HMM parameters based on the estmated state occupaton probabltes (Maxmsaton Bacward Recurson ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 β t (s = p(x t+, x t+, x T S(t = s, λ t- t β t (s a t+ b (x t+ a a β t+ (s b (x t+ β t+ (s b (x t+ β t+ (s ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 66 Bacward probabltes To estmate the state occupaton probabltes t s useful to defne (recursvely another set of probabltes the Bacward probabltes β t (s = p(x t+, x t+, x T S(t = s, λ The probablty of future observatons gven a the HMM s n state s at tme t These can be recursvely computed (gong bacwards n tme Intalsaton β T (s = a E Recurson Termnaton β t (s = a b (x t+ β t+ (s = p(x λ = β (s I = State Occupaton Probablty a I b (x β (s = α T (s E = ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 65 The state occupaton probablty γ t (s s the probablty of occupyng state s at tme t gven the sequence of observatons Express n terms of the forward and bacward probabltes: γ t (s = P(S(t = s X, λ = α T (s E α t(β t ( recallng that p(x λ = α T (s E Snce α t (s β t (s = p(x,..., x t, S(t = s λ p(x t+, x t+, x T S(t = s, λ = p(x,..., x t, x t+, x t+,..., x T, S(t = s λ = p(x, S(t = s λ P(S(t = s X, λ = p(x, S(t = s λ p(x λ ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 67

18 Re-estmaton of Gaussan parameters Re-estmaton of transton probabltes The sum of state occupaton probabltes through tme for a state, may be regarded as a soft count We can use ths soft algnment to re-estmate the HMM parameters: T ˆµ t= = γ t(s x t T t= γ t(s T ˆΣ t= = γ t(s (x t ˆµ (x ˆµ T T t= γ t(s Smlarly to the state occupaton probablty, we can estmate ξ t (s, s, the probablty of beng n s at tme t and s at t +, gven the observatons: ξ t (s, s = P(S(t = s, S(t + = s X, λ = P(S(t = s, S(t + = s, X λ p(x Λ = α t(s a b (x t+ β t+ (s α T (s E We can use ths to re-estmate the transton probabltes â = T t= ξ t(s, s N T = t= ξ t(s, s Pullng t all together ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 68 Extenson to a corpus of utterances ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 69 Iteratve estmaton of HMM parameters usng the EM algorthm. At each teraton E step For all tme-state pars Recursvely compute the forward probabltes α t (s and bacward probabltes β t ( Compute the state occupaton probabltes γ t (s and ξ t (s, s M step Based on the estmated state occupaton probabltes re-estmate the HMM parameters: mean vectors µ, covarance matrces Σ and transton probabltes a The applcaton of the EM algorthm to HMM tranng s sometmes called the Forward-Bacward algorthm We usually tran from a large corpus of R utterances If x r t s the tth frame of the rth utterance X r then we can compute the probabltes α r t(, β r t (, γ r t (s and ξ r t (s, s as before The re-estmates are as before, except we must sum over the R utterances, eg: R T ˆµ r= t= = γr t (s x r t R T r= t= γr t (s ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7

19 Extenson to Gaussan mxture model (GMM The assumpton of a Gaussan dstrbuton at each state s very strong; n practce the acoustc feature vectors assocated wth a state may be strongly non-gaussan In ths case an M-component Gaussan mxture model s an approprate densty functon: b (x = p(x s = M c m N (x; µ m, Σ m m= Gven enough components, ths famly of functons can model any dstrbuton. Tran usng the EM algorthm, n whch the component estmaton probabltes are estmated n the E-step EM tranng of HMM/GMM Rather than estmatng the state-tme algnment, we estmate the component/state-tme algnment, and component-state occupaton probabltes γ t (s, m: the probablty of occupyng mxture component m of state s at tme t We can thus re-estmate the mean of mxture component m of state s as follows T ˆµ m t= = γ t(s, mx t T t= γ t(s, m And lewse for the covarance matrces (mxture models often use dagonal covarance matrces The mxture coeffcents are re-estmated n a smlar way to transton probabltes: T t= ĉ m = γ t(s, m M T l= t= γ t(s, l Dong the computaton ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7 Summary: HMMs ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 73 The forward, bacward and Vterb recursons result n a long sequence of probabltes beng multpled Ths can cause floatng pont underflow problems In practce computatons are performed n the log doman (n whch multples become adds Worng n the log doman also avods needng to perform the exponentaton when computng Gaussans HMMs provde a generatve model for statstcal speech recognton Three ey problems Computng the overall lelhood: the Forward algorthm Decodng the most lely state sequence: the Vterb algorthm 3 Estmatng the most lely parameters: the EM (Forward-Bacward algorthm Solutons to these problems are tractable due to the two ey HMM assumptons Condtonal ndependence of observatons gven the current state Marov assumpton on the states ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 75

20 References: HMMs Gales and Young (7. The Applcaton of Hdden Marov Models n Speech Recognton, Foundatons and Trends n Sgnal Processng, (3, 95 3: secton.. Jurafsy and Martn (8. Speech and Language Processng (nd ed.: sectons ; 9.; 9.. (Errata at SLP-PIEV-Errata.html Rabner and Juang (989. An ntroducton to hdden Marov models, IEEE ASSP Magazne, 3 (, 6. Renals and Han (. Speech Recognton, Computatonal Lngustcs and Natural Language Processng Handboo, Clar, Fox and Lappn (eds., Blacwells. ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 76

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs -755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1 Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

Mixture of Gaussians Expectation Maximization (EM) Part 2

Mixture of Gaussians Expectation Maximization (EM) Part 2 Mture of Gaussans Eectaton Mamaton EM Part 2 Most of the sldes are due to Chrstoher Bsho BCS Summer School Eeter 2003. The rest of the sldes are based on lecture notes by A. Ng Lmtatons of K-means Hard

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0 Bezer curves Mchael S. Floater August 25, 211 These notes provde an ntroducton to Bezer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of the

More information

Differentiating Gaussian Processes

Differentiating Gaussian Processes Dfferentatng Gaussan Processes Andrew McHutchon Aprl 17, 013 1 Frst Order Dervatve of the Posteror Mean The posteror mean of a GP s gven by, f = x, X KX, X 1 y x, X α 1 Only the x, X term depends on the

More information

Mixture o f of Gaussian Gaussian clustering Nov

Mixture o f of Gaussian Gaussian clustering Nov Mture of Gaussan clusterng Nov 11 2009 Soft vs hard lusterng Kmeans performs Hard clusterng: Data pont s determnstcally assgned to one and only one cluster But n realty clusters may overlap Soft-clusterng:

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Lecture Nov

Lecture Nov Lecture 18 Nov 07 2008 Revew Clusterng Groupng smlar obects nto clusters Herarchcal clusterng Agglomeratve approach (HAC: teratvely merge smlar clusters Dfferent lnkage algorthms for computng dstances

More information

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30 STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear

More information

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede Fall 0 Analyss of Expermental easurements B. Esensten/rev. S. Errede We now reformulate the lnear Least Squares ethod n more general terms, sutable for (eventually extendng to the non-lnear case, and also

More information

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics /7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for

More information

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Expected Value and Variance

Expected Value and Variance MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or

More information

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14 APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Bézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0

Bézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0 Bézer curves Mchael S. Floater September 1, 215 These notes provde an ntroducton to Bézer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of

More information

Introduction to Hidden Markov Models

Introduction to Hidden Markov Models Introducton to Hdden Markov Models Alperen Degrmenc Ths document contans dervatons and algorthms for mplementng Hdden Markov Models. The content presented here s a collecton of my notes and personal nsghts

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Tracking with Kalman Filter

Tracking with Kalman Filter Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

Chapter 7 Channel Capacity and Coding

Chapter 7 Channel Capacity and Coding Chapter 7 Channel Capacty and Codng Contents 7. Channel models and channel capacty 7.. Channel models Bnary symmetrc channel Dscrete memoryless channels Dscrete-nput, contnuous-output channel Waveform

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

18.1 Introduction and Recap

18.1 Introduction and Recap CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

More information

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

Clustering with Gaussian Mixtures

Clustering with Gaussian Mixtures Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition) Count Data Models See Book Chapter 11 2 nd Edton (Chapter 10 1 st Edton) Count data consst of non-negatve nteger values Examples: number of drver route changes per week, the number of trp departure changes

More information

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y) Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton,

More information

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the

More information

6. Stochastic processes (2)

6. Stochastic processes (2) Contents Markov processes Brth-death processes Lect6.ppt S-38.45 - Introducton to Teletraffc Theory Sprng 5 Markov process Consder a contnuous-tme and dscrete-state stochastc process X(t) wth state space

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

6. Stochastic processes (2)

6. Stochastic processes (2) 6. Stochastc processes () Lect6.ppt S-38.45 - Introducton to Teletraffc Theory Sprng 5 6. Stochastc processes () Contents Markov processes Brth-death processes 6. Stochastc processes () Markov process

More information

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,

More information

LECTURE 9 CANONICAL CORRELATION ANALYSIS

LECTURE 9 CANONICAL CORRELATION ANALYSIS LECURE 9 CANONICAL CORRELAION ANALYSIS Introducton he concept of canoncal correlaton arses when we want to quantfy the assocatons between two sets of varables. For example, suppose that the frst set of

More information

4DVAR, according to the name, is a four-dimensional variational method.

4DVAR, according to the name, is a four-dimensional variational method. 4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The

More information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING

More information

Hopfield Training Rules 1 N

Hopfield Training Rules 1 N Hopfeld Tranng Rules To memorse a sngle pattern Suppose e set the eghts thus - = p p here, s the eght beteen nodes & s the number of nodes n the netor p s the value requred for the -th node What ll the

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

Primer on High-Order Moment Estimators

Primer on High-Order Moment Estimators Prmer on Hgh-Order Moment Estmators Ton M. Whted July 2007 The Errors-n-Varables Model We wll start wth the classcal EIV for one msmeasured regressor. The general case s n Erckson and Whted Econometrc

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads

More information

Statistical pattern recognition

Statistical pattern recognition Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve

More information

The Basic Idea of EM

The Basic Idea of EM The Basc Idea of EM Janxn Wu LAMDA Group Natonal Key Lab for Novel Software Technology Nanjng Unversty, Chna wujx2001@gmal.com June 7, 2017 Contents 1 Introducton 1 2 GMM: A workng example 2 2.1 Gaussan

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng

More information

Note on EM-training of IBM-model 1

Note on EM-training of IBM-model 1 Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD CHALMERS, GÖTEBORGS UNIVERSITET SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 35, FIM 72 GU, PhD Tme: Place: Teachers: Allowed materal: Not allowed: January 2, 28, at 8 3 2 3 SB

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours UNIVERSITY OF TORONTO Faculty of Arts and Scence December 005 Examnatons STA47HF/STA005HF Duraton - hours AIDS ALLOWED: (to be suppled by the student) Non-programmable calculator One handwrtten 8.5'' x

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

Chapter 20 Duration Analysis

Chapter 20 Duration Analysis Chapter 20 Duraton Analyss Duraton: tme elapsed untl a certan event occurs (weeks unemployed, months spent on welfare). Survval analyss: duraton of nterest s survval tme of a subject, begn n an ntal state

More information

Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities

Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities Supplementary materal: Margn based PU Learnng We gve the complete proofs of Theorem and n Secton We frst ntroduce the well-known concentraton nequalty, so the covarance estmator can be bounded Then we

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/

More information

SIO 224. m(r) =(ρ(r),k s (r),µ(r))

SIO 224. m(r) =(ρ(r),k s (r),µ(r)) SIO 224 1. A bref look at resoluton analyss Here s some background for the Masters and Gubbns resoluton paper. Global Earth models are usually found teratvely by assumng a startng model and fndng small

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Goodness of fit and Wilks theorem

Goodness of fit and Wilks theorem DRAFT 0.0 Glen Cowan 3 June, 2013 Goodness of ft and Wlks theorem Suppose we model data y wth a lkelhood L(µ) that depends on a set of N parameters µ = (µ 1,...,µ N ). Defne the statstc t µ ln L(µ) L(ˆµ),

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

XII.3 The EM (Expectation-Maximization) Algorithm

XII.3 The EM (Expectation-Maximization) Algorithm XII.3 The EM (Expectaton-Maxzaton) Algorth Toshnor Munaata 3/7/06 The EM algorth s a technque to deal wth varous types of ncoplete data or hdden varables. It can be appled to a wde range of learnng probles

More information

Hidden Markov Model Cheat Sheet

Hidden Markov Model Cheat Sheet Hdden Markov Model Cheat Sheet (GIT ID: dc2f391536d67ed5847290d5250d4baae103487e) Ths document s a cheat sheet on Hdden Markov Models (HMMs). It resembles lecture notes, excet that t cuts to the chase

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

Machine learning: Density estimation

Machine learning: Density estimation CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of

More information

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo

More information

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva Econ 39 - Statstcal Propertes of the OLS estmator Sanjaya DeSlva September, 008 1 Overvew Recall that the true regresson model s Y = β 0 + β 1 X + u (1) Applyng the OLS method to a sample of data, we estmate

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of Chapter 7 Generalzed and Weghted Least Squares Estmaton The usual lnear regresson model assumes that all the random error components are dentcally and ndependently dstrbuted wth constant varance. When

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Assignment 2. Tyler Shendruk February 19, 2010

Assignment 2. Tyler Shendruk February 19, 2010 Assgnment yler Shendruk February 9, 00 Kadar Ch. Problem 8 We have an N N symmetrc matrx, M. he symmetry means M M and we ll say the elements of the matrx are m j. he elements are pulled from a probablty

More information

ρ some λ THE INVERSE POWER METHOD (or INVERSE ITERATION) , for , or (more usually) to

ρ some λ THE INVERSE POWER METHOD (or INVERSE ITERATION) , for , or (more usually) to THE INVERSE POWER METHOD (or INVERSE ITERATION) -- applcaton of the Power method to A some fxed constant ρ (whch s called a shft), x λ ρ If the egenpars of A are { ( λ, x ) } ( ), or (more usually) to,

More information

MAXIMUM A POSTERIORI TRANSDUCTION

MAXIMUM A POSTERIORI TRANSDUCTION MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,

More information

Introduction to Regression

Introduction to Regression Introducton to Regresson Dr Tom Ilvento Department of Food and Resource Economcs Overvew The last part of the course wll focus on Regresson Analyss Ths s one of the more powerful statstcal technques Provdes

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis Statstcal analyss usng matlab HY 439 Presented by: George Fortetsanaks Roadmap Probablty dstrbutons Statstcal estmaton Fttng data to probablty dstrbutons Contnuous dstrbutons Contnuous random varable X

More information