Overview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition
|
|
- Georgiana Townsend
- 5 years ago
- Views:
Transcription
1 Overvew Hdden Marov Models and Gaussan Mxture Models Steve Renals and Peter Bell Automatc Speech Recognton ASR Lectures &5 8/3 January 3 HMMs and GMMs Key models and algorthms for HMM acoustc models Gaussans GMMs: Gaussan mxture models HMMs: Hdden Marov models HMM algorthms Lelhood computaton (forward algorthm Most probable state sequence (Vterb algorthm Estmtng the parameters (EM algorthm ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Fundamental Equaton of Statstcal Speech Recognton Acoustc Modellng If X s the sequence of acoustc feature vectors (observatons and W denotes a word sequence, the most lely word sequence W s gven by Applyng Bayes Theorem: W = arg max P(W X W P(W X = p(x WP(W p(x p(x WP(W W = arg max W p(x W }{{} Acoustc model P(W }{{} Language model Recorded Speech Sgnal Analyss Tranng Data Decoded Text (Transcrpton Hdden Marov Model Acoustc Model Lexcon Language Model Search Space ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models
2 Herarchcal modellng of speech Acoustc Model: Contnuous Densty HMM Generatve Model "No rght" Utterance P(s s P(s s P(s 3 s 3 NO RIGHT Word n oh r a t Subword s s s 3 s E s I P(s s I P(s s P(s 3 s P(s E s 3 HMM p(x s p(x s p(x s 3 Acoustcs x x x Probablstc fnte state automaton Paramaters λ: Transton probabltes: a = P(s s Output probablty densty functon: b (x = p(x s ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 5 Acoustc Model: Contnuous Densty HMM HMM Assumptons ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 P(s s P(s s P(s 3 s 3 s I s s s 3 s E s s s 3 s E s I P(s s I P(s s P(s 3 s P(s E s 3 p(x s p(x s p(x s 3 Paramaters λ: x x x 3 x x 5 x 6 Probablstc fnte state automaton Transton probabltes: a = P(s s Output probablty densty functon: b (x = p(x s x x Observaton ndependence An acoustc observaton x s condtonally ndependent of all other observatons gven the state that generated t Marov process A state s condtonally ndependent of all other states gven the prevous state x ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7
3 HMM Assumptons s(t s(t s(t+ x(t x(t x(t + HMM OUTPUT DISTRIBUTION Observaton ndependence An acoustc observaton x s condtonally ndependent of all other observatons gven the state that generated t Marov process A state s condtonally ndependent of all other states gven the prevous state ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 8 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 9 Output dstrbuton Bacground: cdf P(s s P(s s P(s 3 s 3 s s s 3 s E s I P(s s I P(s s P(s 3 s P(s E s 3 Consder a real valued random varable X Cumulatve dstrbuton functon (cdf F (x for X : p(x s p(x s p(x s 3 F (x = P(X x x x x Sngle multvarate Gaussan wth mean µ, covarance matrx Σ : b (x = p(x s = N (x; µ, Σ M-component Gaussan mxture model: M b (x = p(x s = c m N (x; µ m, Σ m m= To obtan the probablty of fallng n an nterval we can do the followng: P(a < X b = P(X b P(X a = F (b F (a ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models
4 Bacground: pdf The Gaussan dstrbuton (unvarate The rate of change of the cdf gves us the probablty densty functon (pdf, p(x: p(x = d dx F (x = F (x F (x = x p(xdx p(x s not the probablty that X has value x. But the pdf s proportonal to the probablty that X les n a small nterval centred on x. Notaton: p for pdf, P for probablty The Gaussan (or Normal dstrbuton s the most common (and easly analysed contnuous dstrbuton It s also a reasonable model n many stuatons (the famous bell curve If a (scalar varable has a Gaussan dstrbuton, then t has a probablty densty functon wth ths form: ( p(x µ, σ = N(x; µ, σ (x µ = exp πσ σ The Gaussan s descrbed by two parameters: the mean µ (locaton the varance σ (dsperson Plot of Gaussan dstrbuton ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Propertes of the Gaussan dstrbuton ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3 Gaussans have the same shape, wth the locaton controlled by the mean, and the spread controlled by the varance One-dmensonal Gaussan wth zero mean and unt varance (µ =, σ = : N(x; µ, σ =. ( (x µ exp πσ σ pdfs of Gaussan dstrbutons..35 mean= varance= pdf of Gaussan Dstrbuton.35.3 mean= varance= mean= varance= p(x m,s. p(x m,s..5 mean= varance= x ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models x ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 5
5 Parameter estmaton Estmate mean and varance parameters of a Gaussan from data x, x,..., x n Use sample mean and sample varance estmates: µ = n σ = n n = x (sample mean n (x µ (sample varance = Exercse Consder the log lelhood of a set of N data ponts {x,..., x N } beng generated by a Gaussan wth mean µ and varance σ : L = ln p({x,..., x n } µ, σ = = σ ( (xn µ n= σ ln σ ln(π (x n µ N ln σ N ln(π n= By maxmsng the the log lelhood functon wth respect to µ show that the maxmum lelhood estmate for the mean s ndeed the sample mean: µ ML = x n. N n= ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 The multdmensonal Gaussan dstrbuton Covarance matrx ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7 The mean vector µ s the expectaton of x: The d-dmensonal vector x s multvarate Gaussan f t has a probablty densty functon of the followng form: ( p(x µ, Σ = (π d/ exp Σ / (x µt Σ (x µ The pdf s parameterzed by the mean vector µ and the covarance matrx Σ. The -dmensonal Gaussan s a specal case of ths pdf The argument to the exponental.5(x µ T Σ (x µ s referred to as a quadratc form. ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 8 µ = E[x] The covarance matrx Σ s the expectaton of the devaton of x from the mean: Σ = E[(x µ(x µ T ] Σ s a d d symmetrc matrx: Σ = E[(x µ (x µ ] = E[(x µ (x µ ] = Σ The sgn of the covarance helps to determne the relatonshp between two components: If x s large when x s large, then (x µ (x µ wll tend to be postve; If x s small when x s large, then (x µ (x µ wll tend to be negatve. ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 9
6 Sphercal Gaussan Dagonal Covarance Gaussan Surface plot of p(x, x Contour plot of p(x, x Surface plot of p(x, x Contour plot of p(x, x p(x, x x.5.5 p(x, x x x µ =.5.5 ( x.5.5 Σ = x ( ρ =. x µ = ( x Σ = x ( ρ = Full covarance Gaussan ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Parameter estmaton ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models p(x, x x µ = ( Surface plot of p(x, x 3 x Σ = 3 x 3 3 ( Contour plot of p(x, x 3 3 x ρ =.5 It s possble to show that the mean vector ˆµ and covarance matrx ˆΣ that maxmze the lelhood of the tranng data are gven by: ˆµ = N ˆΣ = N n= x n (x n ˆµ(x n ˆµ T n= The mean of the dstrbuton s estmated by the sample mean and the covarance by the sample covarance ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3
7 Example data Maxmum lelhood ft to a Gaussan 5 5 X X X X Data n clusters (example ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Example ft by a Gaussan ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models µ = [ ] T µ = [ ] T Σ = Σ =.I µ = [ ] T µ = [ ] T Σ = Σ =.I ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7
8 -means clusterng -means example: data set (,3 -means s an automatc procedure for clusterng unlabelled data Requres a prespecfed number of clusters Clusterng algorthm chooses a set of clusters wth the mnmum wthn-cluster varance Guaranteed to converge (eventually Clusterng soluton s dependent on the ntalsaton 5 (,9 (7,8 (6,6 (7,6 (,5 (5, (8, (,5 (, (5, (, (3, (, 5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 8 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 9 -means example: ntalzaton -means example: teraton (assgn ponts to clusters (,3 (,3 (,9 (7,8 (,9 (7,8 5 (6,6 (7,6 (,5 (,5 5 (6,6 (7,6 (,5 (,5 (5, (8, (5, (8, (, (5, (, (5, (, (3, (, 5 (, (3, (, 5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3
9 -means example: teraton (recompute centres -means example: teraton (assgn ponts to clusters (,3 (,3 (,9 (.33, (,9 (.33, (7,8 (7,8 5 (6,6 (7,6 (,5 (,5 5 (6,6 (,5 (7,6 (,5 (, (3.57, 3 (5, (5, (8, (8.75,3.75 (, (3.57, 3 (5, (5, (8, (8.75,3.75 (, (3, (, 5 (, (3, (, 5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 33 -means example: teraton (recompute centres -means example: teraton 3 (assgn ponts to clusters (,3 (,3 (,9 (.33, (,9 (.33, (7,8 (7,8 5 (6,6 (7,6 (,5 (5, (8, (,5 (8.,. 5 (,5 (6,6 (5, (7,6 (8, (,5 (8.,. (, (, (3.7,.5 (5, (3, (, 5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3 (, (, No changes, so converged (3.7,.5 (5, (3, (, 5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 35
10 Mxture model Component occupaton probablty A more flexble form of densty estmaton s made up of a lnear combnaton of component denstes: p(x = M p(x P( = Ths s called a mxture model or a mxture densty p(x : component denstes P(: mxng parameters Generatve model: Choose a mxture component based on P( Generate a data pont x from the chosen component usng p(x We can apply Bayes theorem: P( x = p(x P( p(x = p(x P( M = p(x P( The posteror probabltes P( x gve the probablty that component was responsble for generatng data pont x The P( xs are called the component occupaton probabltes (or sometmes called the responsbltes Snce they are posteror probabltes: M P( x = = Parameter estmaton ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 36 If we new whch mxture component was responsble for a data pont: we would be able to assgn each pont unambguously to a mxture component and we could estmate the mean for each component Gaussan as the sample mean (ust le -means clusterng and we could estmate the covarance as the sample covarance But we don t now whch mxture component a data pont comes from... Maybe we could use the component occupaton probabltes P( x? Gaussan mxture model ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 37 The most mportant mxture model s the Gaussan Mxture Model (GMM, where the component denstes are Gaussans Consder a GMM, where each component Gaussan N (x; µ, σ has mean µ and a sphercal covarance Σ = σ I p(x = p(x P P(p(x = = P( p(x P( p(x P P(N (x; µ, σ = P(M p(x M ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 38 x x x d ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 39
11 GMM Parameter estmaton when we now whch component generated the data Defne the ndcator varable z n = f component generated component x n (and otherwse If z n wasn t hdden then we could count the number of observed data ponts generated by : N = EM algorthm n= And estmate the mean, varance and mxng parameters as: Problem! Recall that: ˆµ = ˆσ = ˆP( = N n z nx n N z n n z n x n µ n N z n = N N ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models P( x = p(x P( p(x We need to now p(x and P( to estmate the parameters of p(x and to estmate P(... Soluton: an teratve algorthm where each teraton has two parts: Compute the component occupaton probabltes P( x usng the current estmates of the GMM parameters (means, varances, mxng parameters (E-step Computer the GMM parameters usng the current estmates of the component occupaton probabltes (M-step Startng from some ntalzaton (e.g. usng -means for the means these steps are alternated untl convergence Ths s called the EM Algorthm and can be shown to maxmze the lelhood ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Soft assgnment Estmate soft counts based on the component occupaton probabltes P( x n : N = P( x n n= We can magne assgnng data ponts to component weghted by the component occupaton probablty P( x n So we could magne estmatng the mean, varance and pror probabltes as: n ˆµ = P( xn x n n = P( xn x n n P( xn N n ˆσ = P( xn x n µ n = P( xn x n µ n P( xn ˆP( = N n P( x n = N N ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Maxmum lelhood parameter estmaton The lelhood of a data set X = {x, x,..., x N } s gven by: L = N p(x n = n= N n= = N M p(x n P( We can regard the negatve log lelhood as an error functon: E = ln L = = ln p(x n n= M ln p(x n P( n= = Consderng the dervatves of E wth respect to the parameters, gves expressons le the prevous slde ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3
12 Example ft usng a GMM Pealy dstrbuted data (Example µ = µ = [ ] T Σ =.I Σ = I.5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models.5 Example ft by a Gaussan.5 Example ft by a GMM ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Ftted wth a two component GMM usng EM µ = µ = [ ] T Σ =.I Σ = I 3 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7
13 Example : component Gaussans Comments on GMMs GMMs traned usng the EM algorthm are able to self organze to ft a data set Indvdual components tae responsblty for parts of the data set (probablstcally Soft assgnment to components not hard assgnment soft clusterng GMMs scale very well, e.g.: large speech recognton systems can have 3, GMMs, each wth 3 components: sometmes mllon Gaussan components!! And the parameters all estmated from (a lot of data by EM ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 8 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 9 Bac to HMMs... The three problems of HMMs P(s s P(s s P(s 3 s 3 s s s 3 s E s I P(s s I P(s s P(s 3 s P(s E s 3 x p(x s x p(x s x p(x s 3 Output dstrbuton: Sngle multvarate Gaussan wth mean µ, covarance matrx Σ : b (x = p(x s = N (x; µ, Σ M-component Gaussan mxture model: M b (x = p(x s = c m N (x; µ m, Σ m m= ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 5 Worng wth HMMs requres the soluton of three problems: Lelhood Determne the overall lelhood of an observaton sequence X = (x,..., x t,..., x T beng generated by an HMM Decodng Gven an observaton sequence and an HMM, determne the most probable hdden state sequence 3 Tranng Gven an observaton sequence and an HMM, learn the best HMM parameters λ = {{a }, {b (}} ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 5
14 . Lelhood: The Forward algorthm Goal: determne p(x λ Sum over all possble state sequences s s... s T that could result n the observaton sequence X Rather than enumeratng each sequence, compute the probabltes recursvely (explotng the Marov assumpton Recursve algorthms on HMMs Vsualze the problem as a state-tme trells t- t t+ ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 53. Lelhood: The Forward algorthm. Lelhood: The Forward recurson Goal: determne p(x λ Sum over all possble state sequences s s... s T that could result n the observaton sequence X Rather than enumeratng each sequence, compute the probabltes recursvely (explotng the Marov assumpton Forward probablty, α t (s : the probablty of observng the observaton sequence x... x t and beng n state s at tme t: α t (s = p(x,..., x t, S(t = s λ Intalzaton Recurson Termnaton α (s I = α (s = f s s I α t (s = α t (s a b (x t = p(x λ = α T (s E = α T (s a E = ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 55
15 . Lelhood: Forward Recurson Vterb approxmaton α t (s = p(x,..., x t, S(t = s λ t- α t (s α t (s a α t (s Vterb Recurson a a b (x t t α t (s t+ ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 56 Instead of summng over all possble state sequences, ust consder the most lely Acheve ths by changng the summaton to a maxmsaton n the recurson: V t (s = max V t (s a b (x t Changng the recurson n ths way gves the lelhood of the most probable path We need to eep trac of the states that mae up ths path by eepng a sequence of bacponters to enable a Vterb bactrace: the bacponter for each state at each tme ndcates the prevous state on the most probable path Vterb Recurson ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 57 Lelhood of the most probable path t- t t+ V t (s a b (x t max V t (s Bacponters to the prevous state on the most probable path t- t bt t (s = s t+ b (x t V t (s a a V t (s a V t (s V t (s ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 58 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 59
16 . Decodng: The Vterb algorthm Vterb Bactrace Intalzaton Recurson Termnaton V t (s = V (s I = V (s = bt (s = f s s I N max = V t (s a b (x t bt t (s = arg N max = V t (s a b (x t P = V T (s E = N max = V T (s a E s T = bt T (q E = arg N max = V T (s a E ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 Bactrace to fnd the state sequence of the most probable path t- a t bt t (s = s b (x t V t (s t+ V t (s bt t+ (s = s ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 3. Tranng: Forward-Bacward algorthm Goal: Effcently estmate the parameters of an HMM λ from an observaton sequence Assume sngle Gaussan output probablty dstrbuton Parameters λ: Transton probabltes a : b (x = p(x s = N (x; µ, Σ a = Gaussan parameters for state s : mean vector µ ; covarance matrx Σ ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 Vterb Tranng If we new the state-tme algnment, then each observaton feature vector could be assgned to a specfc state A state-tme algnment can be obtaned usng the most probable path obtaned by Vterb decodng Maxmum lelhood estmate of a, f C(s s s the count of transtons from s to s â = C(s s C(s s Lewse f Z s the set of observed acoustc feature vectors assgned to state, we can use the standard maxmum lelhood estmates for the mean and the covarance: ˆµ x Z = x Z x Z ˆΣ = (x ˆµ (x ˆµ T Z ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 63
17 EM Algorthm Vterb tranng s an approxmaton we would le to consder all possble paths In ths case rather than havng a hard state-tme algnment we estmate a probablty State occupaton probablty: The probablty γ t (s of occupyng state s at tme t gven the sequence of observatons. Compare wth component occupaton probablty n a GMM We can use ths for an teratve algorthm for HMM tranng: the EM algorthm Each teraton has two steps: E-step estmate the state occupaton probabltes (Expectaton M-step re-estmate the HMM parameters based on the estmated state occupaton probabltes (Maxmsaton Bacward Recurson ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 β t (s = p(x t+, x t+, x T S(t = s, λ t- t β t (s a t+ b (x t+ a a β t+ (s b (x t+ β t+ (s b (x t+ β t+ (s ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 66 Bacward probabltes To estmate the state occupaton probabltes t s useful to defne (recursvely another set of probabltes the Bacward probabltes β t (s = p(x t+, x t+, x T S(t = s, λ The probablty of future observatons gven a the HMM s n state s at tme t These can be recursvely computed (gong bacwards n tme Intalsaton β T (s = a E Recurson Termnaton β t (s = a b (x t+ β t+ (s = p(x λ = β (s I = State Occupaton Probablty a I b (x β (s = α T (s E = ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 65 The state occupaton probablty γ t (s s the probablty of occupyng state s at tme t gven the sequence of observatons Express n terms of the forward and bacward probabltes: γ t (s = P(S(t = s X, λ = α T (s E α t(β t ( recallng that p(x λ = α T (s E Snce α t (s β t (s = p(x,..., x t, S(t = s λ p(x t+, x t+, x T S(t = s, λ = p(x,..., x t, x t+, x t+,..., x T, S(t = s λ = p(x, S(t = s λ P(S(t = s X, λ = p(x, S(t = s λ p(x λ ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 67
18 Re-estmaton of Gaussan parameters Re-estmaton of transton probabltes The sum of state occupaton probabltes through tme for a state, may be regarded as a soft count We can use ths soft algnment to re-estmate the HMM parameters: T ˆµ t= = γ t(s x t T t= γ t(s T ˆΣ t= = γ t(s (x t ˆµ (x ˆµ T T t= γ t(s Smlarly to the state occupaton probablty, we can estmate ξ t (s, s, the probablty of beng n s at tme t and s at t +, gven the observatons: ξ t (s, s = P(S(t = s, S(t + = s X, λ = P(S(t = s, S(t + = s, X λ p(x Λ = α t(s a b (x t+ β t+ (s α T (s E We can use ths to re-estmate the transton probabltes â = T t= ξ t(s, s N T = t= ξ t(s, s Pullng t all together ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 68 Extenson to a corpus of utterances ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 69 Iteratve estmaton of HMM parameters usng the EM algorthm. At each teraton E step For all tme-state pars Recursvely compute the forward probabltes α t (s and bacward probabltes β t ( Compute the state occupaton probabltes γ t (s and ξ t (s, s M step Based on the estmated state occupaton probabltes re-estmate the HMM parameters: mean vectors µ, covarance matrces Σ and transton probabltes a The applcaton of the EM algorthm to HMM tranng s sometmes called the Forward-Bacward algorthm We usually tran from a large corpus of R utterances If x r t s the tth frame of the rth utterance X r then we can compute the probabltes α r t(, β r t (, γ r t (s and ξ r t (s, s as before The re-estmates are as before, except we must sum over the R utterances, eg: R T ˆµ r= t= = γr t (s x r t R T r= t= γr t (s ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7
19 Extenson to Gaussan mxture model (GMM The assumpton of a Gaussan dstrbuton at each state s very strong; n practce the acoustc feature vectors assocated wth a state may be strongly non-gaussan In ths case an M-component Gaussan mxture model s an approprate densty functon: b (x = p(x s = M c m N (x; µ m, Σ m m= Gven enough components, ths famly of functons can model any dstrbuton. Tran usng the EM algorthm, n whch the component estmaton probabltes are estmated n the E-step EM tranng of HMM/GMM Rather than estmatng the state-tme algnment, we estmate the component/state-tme algnment, and component-state occupaton probabltes γ t (s, m: the probablty of occupyng mxture component m of state s at tme t We can thus re-estmate the mean of mxture component m of state s as follows T ˆµ m t= = γ t(s, mx t T t= γ t(s, m And lewse for the covarance matrces (mxture models often use dagonal covarance matrces The mxture coeffcents are re-estmated n a smlar way to transton probabltes: T t= ĉ m = γ t(s, m M T l= t= γ t(s, l Dong the computaton ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7 Summary: HMMs ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 73 The forward, bacward and Vterb recursons result n a long sequence of probabltes beng multpled Ths can cause floatng pont underflow problems In practce computatons are performed n the log doman (n whch multples become adds Worng n the log doman also avods needng to perform the exponentaton when computng Gaussans HMMs provde a generatve model for statstcal speech recognton Three ey problems Computng the overall lelhood: the Forward algorthm Decodng the most lely state sequence: the Vterb algorthm 3 Estmatng the most lely parameters: the EM (Forward-Bacward algorthm Solutons to these problems are tractable due to the two ey HMM assumptons Condtonal ndependence of observatons gven the current state Marov assumpton on the states ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 75
20 References: HMMs Gales and Young (7. The Applcaton of Hdden Marov Models n Speech Recognton, Foundatons and Trends n Sgnal Processng, (3, 95 3: secton.. Jurafsy and Martn (8. Speech and Language Processng (nd ed.: sectons ; 9.; 9.. (Errata at SLP-PIEV-Errata.html Rabner and Juang (989. An ntroducton to hdden Marov models, IEEE ASSP Magazne, 3 (, 6. Renals and Han (. Speech Recognton, Computatonal Lngustcs and Natural Language Processng Handboo, Clar, Fox and Lappn (eds., Blacwells. ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 76
Hidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationGaussian Mixture Models
Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous
More informationExpectation Maximization Mixture Models HMMs
-755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More informationHidden Markov Models
Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,
More informationj) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1
Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationMATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More informationLecture 12: Classification
Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More informationP R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More informationLimited Dependent Variables
Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages
More informationMixture of Gaussians Expectation Maximization (EM) Part 2
Mture of Gaussans Eectaton Mamaton EM Part 2 Most of the sldes are due to Chrstoher Bsho BCS Summer School Eeter 2003. The rest of the sldes are based on lecture notes by A. Ng Lmtatons of K-means Hard
More informationComposite Hypotheses testing
Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationBezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0
Bezer curves Mchael S. Floater August 25, 211 These notes provde an ntroducton to Bezer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of the
More informationDifferentiating Gaussian Processes
Dfferentatng Gaussan Processes Andrew McHutchon Aprl 17, 013 1 Frst Order Dervatve of the Posteror Mean The posteror mean of a GP s gven by, f = x, X KX, X 1 y x, X α 1 Only the x, X term depends on the
More informationMixture o f of Gaussian Gaussian clustering Nov
Mture of Gaussan clusterng Nov 11 2009 Soft vs hard lusterng Kmeans performs Hard clusterng: Data pont s determnstcally assgned to one and only one cluster But n realty clusters may overlap Soft-clusterng:
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationLecture Nov
Lecture 18 Nov 07 2008 Revew Clusterng Groupng smlar obects nto clusters Herarchcal clusterng Agglomeratve approach (HAC: teratvely merge smlar clusters Dfferent lnkage algorthms for computng dstances
More informationSTATS 306B: Unsupervised Learning Spring Lecture 10 April 30
STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear
More informationFall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede
Fall 0 Analyss of Expermental easurements B. Esensten/rev. S. Errede We now reformulate the lnear Least Squares ethod n more general terms, sutable for (eventually extendng to the non-lnear case, and also
More informationSpace of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics
/7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for
More informationMIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU
Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern
More informationParametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010
Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationExpected Value and Variance
MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or
More informationAPPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14
APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationBézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0
Bézer curves Mchael S. Floater September 1, 215 These notes provde an ntroducton to Bézer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of
More informationIntroduction to Hidden Markov Models
Introducton to Hdden Markov Models Alperen Degrmenc Ths document contans dervatons and algorthms for mplementng Hdden Markov Models. The content presented here s a collecton of my notes and personal nsghts
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationTracking with Kalman Filter
Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,
More informationLecture 3: Probability Distributions
Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationChapter 7 Channel Capacity and Coding
Chapter 7 Channel Capacty and Codng Contents 7. Channel models and channel capacty 7.. Channel models Bnary symmetrc channel Dscrete memoryless channels Dscrete-nput, contnuous-output channel Waveform
More informationRetrieval Models: Language models
CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum
More informationANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)
Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of
More information18.1 Introduction and Recap
CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng
More informationxp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ
CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and
More informationMarkov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement
Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs
More informationClustering with Gaussian Mixtures
Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your
More informationSee Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)
Count Data Models See Book Chapter 11 2 nd Edton (Chapter 10 1 st Edton) Count data consst of non-negatve nteger values Examples: number of drver route changes per week, the number of trp departure changes
More informationHere is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)
Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton,
More information8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF
10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the
More information6. Stochastic processes (2)
Contents Markov processes Brth-death processes Lect6.ppt S-38.45 - Introducton to Teletraffc Theory Sprng 5 Markov process Consder a contnuous-tme and dscrete-state stochastc process X(t) wth state space
More informationConjugacy and the Exponential Family
CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the
More information6. Stochastic processes (2)
6. Stochastc processes () Lect6.ppt S-38.45 - Introducton to Teletraffc Theory Sprng 5 6. Stochastc processes () Contents Markov processes Brth-death processes 6. Stochastc processes () Markov process
More informationComputation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models
Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,
More informationLECTURE 9 CANONICAL CORRELATION ANALYSIS
LECURE 9 CANONICAL CORRELAION ANALYSIS Introducton he concept of canoncal correlaton arses when we want to quantfy the assocatons between two sets of varables. For example, suppose that the frst set of
More information4DVAR, according to the name, is a four-dimensional variational method.
4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The
More informationMACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression
11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING
More informationHopfield Training Rules 1 N
Hopfeld Tranng Rules To memorse a sngle pattern Suppose e set the eghts thus - = p p here, s the eght beteen nodes & s the number of nodes n the netor p s the value requred for the -th node What ll the
More informationChapter Newton s Method
Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve
More informationThe Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD
he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s
More informationPrimer on High-Order Moment Estimators
Prmer on Hgh-Order Moment Estmators Ton M. Whted July 2007 The Errors-n-Varables Model We wll start wth the classcal EIV for one msmeasured regressor. The general case s n Erckson and Whted Econometrc
More informationMaximum Likelihood Estimation (MLE)
Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationAn Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation
An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads
More informationStatistical pattern recognition
Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve
More informationThe Basic Idea of EM
The Basc Idea of EM Janxn Wu LAMDA Group Natonal Key Lab for Novel Software Technology Nanjng Unversty, Chna wujx2001@gmal.com June 7, 2017 Contents 1 Introducton 1 2 GMM: A workng example 2 2.1 Gaussan
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationGrover s Algorithm + Quantum Zeno Effect + Vaidman
Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the
More informationCourse 395: Machine Learning - Lectures
Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng
More informationNote on EM-training of IBM-model 1
Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationCHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD
CHALMERS, GÖTEBORGS UNIVERSITET SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 35, FIM 72 GU, PhD Tme: Place: Teachers: Allowed materal: Not allowed: January 2, 28, at 8 3 2 3 SB
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours
UNIVERSITY OF TORONTO Faculty of Arts and Scence December 005 Examnatons STA47HF/STA005HF Duraton - hours AIDS ALLOWED: (to be suppled by the student) Non-programmable calculator One handwrtten 8.5'' x
More informationStructure and Drive Paul A. Jensen Copyright July 20, 2003
Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.
More informationChapter 20 Duration Analysis
Chapter 20 Duraton Analyss Duraton: tme elapsed untl a certan event occurs (weeks unemployed, months spent on welfare). Survval analyss: duraton of nterest s survval tme of a subject, begn n an ntal state
More informationSupplementary material: Margin based PU Learning. Matrix Concentration Inequalities
Supplementary materal: Margn based PU Learnng We gve the complete proofs of Theorem and n Secton We frst ntroduce the well-known concentraton nequalty, so the covarance estmator can be bounded Then we
More informationThe Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction
ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also
More informationDepartment of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING
MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/
More informationSIO 224. m(r) =(ρ(r),k s (r),µ(r))
SIO 224 1. A bref look at resoluton analyss Here s some background for the Masters and Gubbns resoluton paper. Global Earth models are usually found teratvely by assumng a startng model and fndng small
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationGoodness of fit and Wilks theorem
DRAFT 0.0 Glen Cowan 3 June, 2013 Goodness of ft and Wlks theorem Suppose we model data y wth a lkelhood L(µ) that depends on a set of N parameters µ = (µ 1,...,µ N ). Defne the statstc t µ ln L(µ) L(ˆµ),
More informationLinear Regression Analysis: Terminology and Notation
ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented
More informationXII.3 The EM (Expectation-Maximization) Algorithm
XII.3 The EM (Expectaton-Maxzaton) Algorth Toshnor Munaata 3/7/06 The EM algorth s a technque to deal wth varous types of ncoplete data or hdden varables. It can be appled to a wde range of learnng probles
More informationHidden Markov Model Cheat Sheet
Hdden Markov Model Cheat Sheet (GIT ID: dc2f391536d67ed5847290d5250d4baae103487e) Ths document s a cheat sheet on Hdden Markov Models (HMMs). It resembles lecture notes, excet that t cuts to the chase
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationMachine learning: Density estimation
CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of
More informationMatrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD
Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo
More informationEcon Statistical Properties of the OLS estimator. Sanjaya DeSilva
Econ 39 - Statstcal Propertes of the OLS estmator Sanjaya DeSlva September, 008 1 Overvew Recall that the true regresson model s Y = β 0 + β 1 X + u (1) Applyng the OLS method to a sample of data, we estmate
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationChapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of
Chapter 7 Generalzed and Weghted Least Squares Estmaton The usual lnear regresson model assumes that all the random error components are dentcally and ndependently dstrbuted wth constant varance. When
More informationUsing T.O.M to Estimate Parameter of distributions that have not Single Exponential Family
IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran
More informationAssignment 2. Tyler Shendruk February 19, 2010
Assgnment yler Shendruk February 9, 00 Kadar Ch. Problem 8 We have an N N symmetrc matrx, M. he symmetry means M M and we ll say the elements of the matrx are m j. he elements are pulled from a probablty
More informationρ some λ THE INVERSE POWER METHOD (or INVERSE ITERATION) , for , or (more usually) to
THE INVERSE POWER METHOD (or INVERSE ITERATION) -- applcaton of the Power method to A some fxed constant ρ (whch s called a shft), x λ ρ If the egenpars of A are { ( λ, x ) } ( ), or (more usually) to,
More informationMAXIMUM A POSTERIORI TRANSDUCTION
MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,
More informationIntroduction to Regression
Introducton to Regresson Dr Tom Ilvento Department of Food and Resource Economcs Overvew The last part of the course wll focus on Regresson Analyss Ths s one of the more powerful statstcal technques Provdes
More informationThe Expectation-Maximization Algorithm
The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.
More informationStatistical analysis using matlab. HY 439 Presented by: George Fortetsanakis
Statstcal analyss usng matlab HY 439 Presented by: George Fortetsanaks Roadmap Probablty dstrbutons Statstcal estmaton Fttng data to probablty dstrbutons Contnuous dstrbutons Contnuous random varable X
More information