9 : Learning Partially Observed GM : EM Algorithm
|
|
- Madison Merritt
- 5 years ago
- Views:
Transcription
1 10-708: Probablstc Graphcal Models , Sprng : Learnng Partally Observed GM : EM Algorthm Lecturer: Erc P. Xng Scrbes: Rohan Ramanath, Rahul Goutam 1 Generalzed Iteratve Scalng In ths secton, we summarze from the prevous lecture the use of Generalzed Iteratve Scalng (GIS) to fnd the Mamum Lelhood Estmaton (MLE) of a feature based Undrected Graphcal Model (UGM). Each feature has a weght θ whch represents the numercal strength of the feature and whether t ncreases or decreases the probablty of the clque. To represent these features as a part of the prabablstc model, we can multply n the clque potentals to get: p() = ep{ θ f ( c )} Z(θ) At the mamum lelhood settng of the parameters, for each clque, the model margnals must be equal to the observed margnals,.e. p m() MLE () = N = p(), where m() s the number of tmes the confguraton s found out of a possble N. The scaled lelhood functon can then be represented as: l(θ; D) = l(θ; D) N = = 1 log p( n θ) N n p() log p( θ) = p() θ f () log Z(θ) Solvng the log Z(θ) term s very complcated. The tangent to the log functon forms an upper bound on t (log Z(θ) µz(θ) log µ 1, µ). Ths fact s used to get a lower bound on the value of l(θ; D) by usng µ = Z 1 (θ (t) ). Substtutng θ (t) = θ θ (t), we get: l(θ; D) p() = = = p() θ f () Z(θ) Z(θ (t) ) log Z(θ(t) ) + 1 θ f () ep{ θ f ()} log Z(θ (t) ) + 1 Z(θ (t) ) θ p()f () θ p()f () ep{ θ(t) p( θ (t) )ep{ f ()}ep{ θ(t) f ()} log Z(θ (t) ) + 1 Z(θ (t) ) θ (t) f ()} log Z(θ (t) ) + 1 1
2 2 9 : Learnng Partally Observed GM : EM Algorthm We can further rela the constrans by assumng f () = 1 and f () 0. Snce the eponental functon s conve, ep( π ) π ep( ) l(θ; D) θ p()f () Λ = p()f () ep( θ (t) ) θ ep( θ (t) ) = p()f () p( θ(t) )f () = p( θ (t) ) f () ep{ θ (t) } log Z(θ (t) ) + 1 = Λ(θ) p( θ (t) )f () = 0 p()f () p(t) ()f () Z(θ(t) ) where p (t) () s the unnormalzed verson of p( θ (t) ). Substtutng the result n the update equaton we get: θ (t+1) = θ (t) + θ (t) p (t+1) () = p (t) () e θ(t) f () = p (t) () ( p()f () p(t) ()f () ( θ (t+1) = θ (t) + log p()f ) () p(t) ()f () ) f() To summarze we can say that GIS can be used for teratve scalng on general UGM wth feature based pottentals. IPF s a specal case of GIS n whch the clque potental s bult on features defned as an ndcator functon of clque confguratons. 2 Mture Models Let us consder a dataset X. We wsh to model the data by specfyng a jont dstrbuton p(x, Z ) = p(x Z )p(z ). Here, Z Multnomal(π) (where π j 0, j π j = 1,and the parameter π j gves p(z = j),) and X Z j = j N (µ j, Σ j ). We let denote the number of values that the Z s can tae on. Thus, our model posts that each X was generated by randomly choosng Z from {1,..., }, and then X was drawn from one of Gaussans dependng on Z. Ths s called the mture of Gaussans model. Also, note that the z () s are latent random varables, meanng that theyre hdden/unobserved. Ths maes the problem tractable as each condtonal dstrbuton P (X Z ) can now be assumed to be unmodal. In the specfc case of a mture of gaussans, margnal densty s factorzed as: P (X µ, Σ) = π N (X µ, Σ ) where π s called the mture proporton and N (X µ, Σ ) s the gaussan dstrbuton for the mture component. The tas s to learn the values of π, µ and Σ. In the fully observed case, t s farly straghtforward as we would have smply decomposed the log lelhood functon and then found the mamum lelhood estmates for all the parameters. The log lelhood can be epressed as:
3 9 : Learnng Partally Observed GM : EM Algorthm 3 Fgure 1: A mture of 2 Gaussans l(θ; D) = log = log P (, z ) P (z π)p ( z, µ, Σ) = = log π z + log N ( µ, Σ ) z z log(π ) 1 z ( µ ) T Σ 1 2 ( µ ) + C Snce t s fully observed, the log lelhood completely decomposes. However, f we do not observe the values of z as n the mture model settng, all the parameters become coupled together as shown below: l c (θ; D) = log z P (, z θ) = log z P (z θ z )P ( z, θ ) It s easy to observe that the summaton s nsde the log and the log lelhood does not decompose ncely anymore. Hence, t s hard to fnd a closed form soluton to the mamum lelhood estmates. Ths provdes motvaton for the Epectaton Mamzaton (EM) algorthm. 3 K-Means Gven a set of observatons ( 1, 2,..., n ), where each observaton s a d-dmensonal real vector, -means clusterng ams to partton the n observatons nto sets ( n) gven by z = {z 1, z 2,..., z n } so as to mnmze the wthn-cluster sum of squares (WCSS). We start wth ntal random guesses for the class centrods and terate between the followng two steps untl convergence. Epectaton Step: Use some dstance metrc (Eucldean dstance, cosne smlarty, e.t.c) and the
4 4 9 : Learnng Partally Observed GM : EM Algorthm current guess of the centrods to assgn every data pont to the nearest centrod. z (t) = argmn ( µ (t) )T Σ 1(t) ( µ (t) ) Mamzaton Step: Use the new class assgnment to recompute centrod values. µ (t+1) = δ(z(t), ) δ(z(t), ) To understand the EM vewpont, we can thn of each of the clusters beng assocated wth some dstrbuton,.e. P ( z = ) N (X µ, Σ ) and we would le to learn the parameters (µ and Σ ) for each dstrbuton. 4 EM Algorthm The EM algorthm s an effcent teratve procedure to compute the Mamum Lelhood (ML) estmate n the presence of latent varables. In ML estmaton, we wsh to estmate the model parameter(s) for whch the observed data are the most lely. As seen n Secton 2 the mamum lelhood estmator cannot be used for nference n the partally observed settng. Fndng a mamum lelhood soluton requres tang dervatves of the lelhood functon wth respect to both the parameters and the latent varables and smultaneously solvng the resultng equatons. In statstcal models wth latent varables, ths usually leads to ntractable solutons. The EM algorthm wors around ths by consderng the epected complete log lelhood. Each teraton of the EM algorthm conssts of two processes: The E-step, and the M-step. In the epectaton, or E-step, the latent varables are estmated gven the observed data and current estmate of the model parameters. Ths s acheved usng the condtonal epectaton, eplanng the choce of termnology.in the M-step, the lelhood functon s mamzed under the assumpton that the latent varables are nown. The estmate of the latent varables from the E-step are used n leu of the actual latent varables. The log lelhood from Secton 2 can be wrtten n the form of an epected log lelhood as: < z > P (Z X;θ) log(π ) 1 2 < z > P (Z X;θ (( µ ) T Σ 1 ( µ ) + log Σ ) + C where < z > P (Z X;θ) s the epected value of z wth respect to P (Z X; θ). The EM algorthm mamzes < l c (θ; D) > P (Z X;θ) by teratng between two steps (called E and M steps). In the E step, we compute the suffcent statstcs of the hdden varable (.e., Z) usng the current estmate of the parameters (.e. µ and Σ ). The M step, mamzes the value of the parameters usng the epected value of the hdden varables computed n the precedng E step. Specfcally, n the GMM, we compute < z > P (Z X;θ) n the E step and mamze the epected complete log lelhood wth respect to the model parameters µ, Σ and π n the M step to obtan: π (t+1) = < n > P (Z X;θ) N µ (t+1) = Σ (t+1) = τ (t) τ (t) τ (t) ( µ t+1 where τ (t) =< z > P (Z X;θ) and < n > P (Z X;θ) = τ (t) )( µ t+1 τ (t) ) T
5 9 : Learnng Partally Observed GM : EM Algorthm 5 5 Comparson between K-means and EM EM algorthm for the mture of gaussans s a soft verson of K means. For K-means, n the E-step, we assgn ponts to clusters and n the M-step, we recompute the cluster centers assumng each pont belongs to a sngle cluster. In the EM verson, the E-step we assgn ponts to clusters n a probablstc manner and n the M-step, we recompute the cluster centrod assumng that the ponts are assgned to clusters n a probablstc manner where the weght of each pont s gven by τ (t) 6 Theory underlyng EM Algorthm The prevous sectons tell us how the EM algorthm can be used to compute the Mamum Lelhood (ML) estmate n the presence of mssng or hdden data. Let X represent the set of observable varables and Z represent the set of all latent varables. The probablty model s p(x, Z θ). If Z were observed, we defne the complete log lelhood as l c (θ; X, Z) = log p(x, Z θ) The ML estmaton would then be mamzng the complete log lelhood. If the probablty p(x, Z θ) factors such that separate components of θ occur n separate factors, then the log separates them nto dfferent terms whch can be mamzed ndependently (decouplng). However, snce Z s not observed, t has to be margnalzed out and the ncomplete log lelhood functon taes the followng form : l c (θ; X) = log p(x θ) = log Z p(x, Z θ) The ncomplete log lelhood cannot be decoupled because the summaton over p(x, Z θ) les nsde the logarthm. In order to handle ths, we average over z to remove the randomness usng an averagng dstrbuton q(z X). The epected complete log lelhood can then be defned as : < l c (θ;, Z) > q = Z q(z, θ) log p(, Z θ) The epected complete log lelhood s a determnstc functon of θ. It s also lnear n the complete log lelhood functon and, hence, nherts ts factorzablty. The epected complete log lelhood also provdes a lower bound on the complete log lelhood as shown below: l(θ; ) = log p( θ) = log Z p(, Z θ) = log Z q(z ) p(, Z θ) q(z )
6 6 9 : Learnng Partally Observed GM : EM Algorthm Z q(z ) log p(, Z θ) (Jensen s nequalty) q(z ) l(θ; ) < l c (θ;, Z) > q +H q where H q s the entropy of the dstrbuton q and does not depend on θ. H q s a constant for a partcular dstrbuton q. Hence, n order to mamze the complete log lelhood wth respect to θ, t s suffcent to mamze the eptected complete log lelhood. The EM algorthm can be seen as a coordnate ascent algorthm. For a fed data X, let a functonal F, called the free energy, be defned as : F (q, θ) = Z q(z ) log p(, Z θ) q(z ) l(θ; ) The EM algorthm s a coordnate ascent algorthm on F. The E-step and the M-step can then be defned as E-step : q t+1 = argma q F (q, θ t ) M-step : θ t+1 = argma θ F (q t+1, θ) At the (t + 1) th step, we mamze the free energy F (q, θ) wth respect to q. For ths optmal choce of q t+1, we mamze F (q, θ) agan wth respect to θ to get the optmal updated value θ t+1. Let us loo at the E-step. If we set q t+1 (z ) to the posteror dstrbuton of the latent varables gven the data and parameters, p(z, θ t ), we mamze F (q, θ t ). The proof of ths clam s gven below : F (p(z, θ t ), θ t ) = Z = Z p(z, θ t ) log p(, Z θt ) p(z, θ t ) q(z ) log p( θ t ) = log p( θ t ) = l(θ t ; ) Snce l(θ t ; ) s an upper bound on F (q, θ t ), the proof shows that F (q, θ t ) s mamzed by settng q(z ) to the posteror probablty dstrbuton p(z, θ t ). The above clam can also be proved usng varatonal calculus or the fact that l(θ; ) F (q, θ) = KL(q p(z, θ)) Wthout loss of generalty, we can assume p(, Z θ) to be a generalzed eponental famly dstrbuton. Then, p(, Z θ) = 1 Z(θ) h(, Z)ep{ θ f (, Z)} A specal case s when p( Z) are GLM. Then, f (, Z) = η T (z)ξ ()
7 9 : Learnng Partally Observed GM : EM Algorthm 7 The epected complete log lelhood under q t+1 = p(z, θ t ) s gven below : < l c (θ t ;, Z) > q t+1 = = q(z, θ t ) log p(, Z θ t ) A(θ) Z θ t < f (, Z) > q(z,θ t ) A(θ) = p GLIM θ t < η (Z) > q(z,θ t ) ξ () A(θ) Now, let us analyze the M-step of the EM algorthm. The M-step can be vewed as the mamzaton of the epected complete log lelhood. Ths s shown below : F (q, θ) = Z = Z q(z ) log p(, Z θ) q(z ) q(z ) log p(, Z θ) Z q(z ) log q(z ) = < l c (θ;, Z) > q +H q The free energy, hence, breas nto two terms. The frst term s the epected complete log lelhood and the second term, whch s ndependent of θ, s the entropy. Thus, mamzng the free energy s equvalent to mamzng the epected complete log lelhood. θ t+1 = argma θ < l c (θ;, Z) > q t+1= argma θ q(z ) log p(, Z θ) Under optmal q t+1, ths s equvalent to solvng a standard MLE of fully observed model p(, Z θ), wth the suffcent statstcs nvolvng Z replaced by ther epectatons w.r.t. p(z, θ). Z 7 Eample: Hdden Marov Model Let us loo at the learnng problem of Hdden Marov Models (HMM) n the framewor of the EM algorthm. The learnng problem n HMMs could be Supervsed learnng In supervsed learnng, we have annotated data wth the correct answer nown. Eamples nclude a genomc regon = where we have good annotatons of the CpG slands or f a casno player allows us to observe hm as he changes dce and produces 10,000 rolls. Snce all varables are fully observed, we can learn the parameters θ of the model by applyng MLE. Unsupervsed learnng In unsupervsed learnng, we do not have annotated data. We only observe some of the varables and the remanng varables (latent varables) reman unnown. Eamples nclude the porcupne genome where we don t now how frequent the CpG slands are or 10,000 rolls of dce of a casno player wthout nowng when he changes dce.
8 8 9 : Learnng Partally Observed GM : EM Algorthm Fgure 2: A hdden marov model 7.1 Baum-Welch Algorthm The Baum-Welch algorthm s used learn the parameters θ of a HMM. The complete log lelhood of a HMM can be wrtten as l c (θ;, Y ) = log p(, Y ) = log n T T p(y n,1 ) p(y n,t Y n,t 1 ) p( n,t Y n,t ) t=2 t=1 The epected complete log lelhood can be wrtten as < l c (θ;, Y ) > = (< Yn,1 > p(yn,1 n) log π ) + T (< Yn,t 1Y j n,t > p(yn,t 1Y n,t n) log a,j ) n n t=2 + T ( n,t < Yn,t > p(yn,t n) log b, ) n t=1 where A = {a,j } = P (Y t = j Y t 1 = ) s the transton matr and B = {b,j } = P (Y t = t = j) s the emsson matr. The E-step of the EM algorthm s then γ n,t = < Y n,t >= p(y n,t = 1 n ) ξ,j n,t = < Y n,t 1Y j n,t >= p(y n,t 1 = 1, Y j n,t = 1 n ) The M-step of the algorthm s π ML = a ML j = b ML = n γ n,1 N T n t=2 ξ,j n,t T 1 n t=1 γ n,t T n t=1 γ n,t n,t T 1 n t=1 γ n,t In the unsupervsed learnng case, the Baum-Welch algorthm proceeds as follows : 1. Start wth the best guess of parameters θ for the model
9 9 : Learnng Partally Observed GM : EM Algorthm 9 2. Estmate A j, B n the tranng data. A j = n,t < Y n,t 1Y j n,t > B = n,t < Y n,t > n,t 3. Update θ accordng to A j, B. Now, problem becomes supervsed learnng. 4. Repeat steps 2 and 3 untl convergence. It can be proven that we get a more (or equally) lely parameter set θ n each teraton. 8 EM for general BNs Algorthm 1 shows the algorthm for EM n a general Bayesan Networ. whle not converged do for each node do ESS = 0 end for each data sample n do do nference wth n,h for each node do ESS + =< SS ( n,, n,π ) > p(n,h n, H ) end end for each node do θ = MLE(ESS ) end end Algorthm 1: EM algorthm for general BN 9 Summary of EM algorthm In summary, the EM algorthm s a way of mamzng the lelhood of the latent varable models. computes the ML estmate of the parameters by breang the orgnal problem nto two parts: It 1. The estmaton of unobserved data from observed data and current parameters. 2. Usng ths model wth all models observed to fnd the ML estmates of parameters. The algorthm alternates between fllng n the latent varables usng the best guess and updatng the parameters based on ths guess. 10 EM for Condtonal Mture Model The model for the Condtonal mture model s defned as
10 10 9 : Learnng Partally Observed GM : EM Algorthm P (Y ) = p(z = 1, ξ)p(y z = 1,, θ, σ) The objectve functon for EM s defned as < l c (θ;, y, z) >= n < log p(z n n, ξ) > p(z,y ) + n < log(p(y n n, Z n, θ, σ) > p(z,y ) The E-step n the EM algorthm s τ (t) n = P (Zn = 1 n, Y n, θ) = P (Z n = 1 n )p (Y n n, θ, σ 2 ) j p(zj n = 1 n )p j (y n n, θ j, σj 2) The M-step n the EM algorthm uses the normal equaton for lnear regresson θ = ( T ) 1 T Y, but wth the data re-weghted by τ or usng the weghted IRLS algorthm to update ξ, θ, σ based on data ponts ( n, y n ) wth weghts τn (t). 11 EM Varants There are some varants of the EM algorthm. Sparse EM: The sparse EM algorthm does not recompute the posteror probablty on each data pont under all models, whch are very close to 0. Instead, t eeps an actve lst whch t updates every once n a whle. Generalzed (Incomplete) EM : Sometmes, t s hard to compute the Mamum lelhood estmate of the parameters n the M-step, even wth the complete data. However, we can stll mae progress by dong a M-step that ncreases the lelhood a bt (le a gradent step). 12 EM: Report Card EM s one of the most popular methods to get the Mamum Lelhood Estmate or the Mamum a Posteror Estmate of parameters n a model where there are some latent unobserved varables. EM does not requre any learnng rate parameter and s very fast for low-dmensons. It also ensures convergence as each teraton s guaranteed to mprove lelhood. The cons of EM algorthm are that t can gve us a local optma nstead of global optma. EM can be slower than conjugate gradent, especally near convergence. It requres an epensve nference step and t s a mamum lelhood/map method. Dsclamer: Some portons of the content has been drectly taen from the lecture sldes 1 and last years scrbe notes 2. The authors do not clam orgnalty of the scrbe. 1 epng/class/10708/lectures/lecture9-em.pdf 2 epng/class/ /lecture/scrbe9.pdf
9 : Learning Partially Observed GM : EM Algorithm
10-708: Probablstc Graphcal Models 10-708, Sprng 2012 9 : Learnng Partally Observed GM : EM Algorthm Lecturer: Erc P. Xng Scrbes: Mrnmaya Sachan, Phan Gadde, Vswanathan Srpradha 1 Introducton So far n
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationExpectation Maximization Mixture Models HMMs
-755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood
More information8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF
10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationMATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More informationThe Expectation-Maximization Algorithm
The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationMean Field / Variational Approximations
Mean Feld / Varatonal Appromatons resented by Jose Nuñez 0/24/05 Outlne Introducton Mean Feld Appromaton Structured Mean Feld Weghted Mean Feld Varatonal Methods Introducton roblem: We have dstrbuton but
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationMixture of Gaussians Expectation Maximization (EM) Part 2
Mture of Gaussans Eectaton Mamaton EM Part 2 Most of the sldes are due to Chrstoher Bsho BCS Summer School Eeter 2003. The rest of the sldes are based on lecture notes by A. Ng Lmtatons of K-means Hard
More informationMACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression
11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More informationConjugacy and the Exponential Family
CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the
More informationCourse 395: Machine Learning - Lectures
Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng
More informationLecture Nov
Lecture 18 Nov 07 2008 Revew Clusterng Groupng smlar obects nto clusters Herarchcal clusterng Agglomeratve approach (HAC: teratvely merge smlar clusters Dfferent lnkage algorthms for computng dstances
More information1 GSW Iterative Techniques for y = Ax
1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationAn Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation
An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads
More information1 Motivation and Introduction
Instructor: Dr. Volkan Cevher EXPECTATION PROPAGATION September 30, 2008 Rce Unversty STAT 63 / ELEC 633: Graphcal Models Scrbes: Ahmad Beram Andrew Waters Matthew Nokleby Index terms: Approxmate nference,
More informationSemi-Supervised Learning
Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors
Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationParametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010
Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationLearning with Maximum Likelihood
Learnng wth Mamum Lelhood Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm,
More informationMixture o f of Gaussian Gaussian clustering Nov
Mture of Gaussan clusterng Nov 11 2009 Soft vs hard lusterng Kmeans performs Hard clusterng: Data pont s determnstcally assgned to one and only one cluster But n realty clusters may overlap Soft-clusterng:
More informationMaximum Likelihood Estimation (MLE)
Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for
More informationLecture 12: Classification
Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna
More informationClustering gene expression data & the EM algorithm
CG, Fall 2011-12 Clusterng gene expresson data & the EM algorthm CG 08 Ron Shamr 1 How Gene Expresson Data Looks Entres of the Raw Data matrx: Rato values Absolute values Row = gene s expresson pattern
More informationPHYS 705: Classical Mechanics. Calculus of Variations II
1 PHYS 705: Classcal Mechancs Calculus of Varatons II 2 Calculus of Varatons: Generalzaton (no constrant yet) Suppose now that F depends on several dependent varables : We need to fnd such that has a statonary
More informationCS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning
CS9 Problem Set #3 Solutons CS 9, Publc Course Problem Set #3 Solutons: Learnng Theory and Unsupervsed Learnng. Unform convergence and Model Selecton In ths problem, we wll prove a bound on the error of
More informationLecture Space-Bounded Derandomization
Notes on Complexty Theory Last updated: October, 2008 Jonathan Katz Lecture Space-Bounded Derandomzaton 1 Space-Bounded Derandomzaton We now dscuss derandomzaton of space-bounded algorthms. Here non-trval
More informationClustering with Gaussian Mixtures
Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationUsing T.O.M to Estimate Parameter of distributions that have not Single Exponential Family
IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran
More informationSpeech and Language Processing
Speech and Language rocessng Lecture 3 ayesan network and ayesan nference Informaton and ommuncatons Engneerng ourse Takahro Shnozak 08//5 Lecture lan (Shnozak s part) I gves the frst 6 lectures about
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationMachine Learning for Signal Processing Linear Gaussian Models
Machne Learnng for Sgnal rocessng Lnear Gaussan Models lass 2. 2 Nov 203 Instructor: Bhsha Raj 2 Nov 203 755/8797 HW3 s up. Admnstrva rojects please send us an update 2 Nov 203 755/8797 2 Recap: MA stmators
More informationGaussian Mixture Models
Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous
More informationMatrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD
Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo
More informationExpectation propagation
Expectaton propagaton Lloyd Ellott May 17, 2011 Suppose p(x) s a pdf and we have a factorzaton p(x) = 1 Z n f (x). (1) =1 Expectaton propagaton s an nference algorthm desgned to approxmate the factors
More informationMarkov Chain Monte Carlo Lecture 6
where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways
More informationxp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ
CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and
More informationThe Basic Idea of EM
The Basc Idea of EM Janxn Wu LAMDA Group Natonal Key Lab for Novel Software Technology Nanjng Unversty, Chna wujx2001@gmal.com June 7, 2017 Contents 1 Introducton 1 2 GMM: A workng example 2 2.1 Gaussan
More information18.1 Introduction and Recap
CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng
More informationGrover s Algorithm + Quantum Zeno Effect + Vaidman
Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the
More informationIntroduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:
CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and
More informationChapter Newton s Method
Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve
More informationAppendix B: Resampling Algorithms
407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationStatistical analysis using matlab. HY 439 Presented by: George Fortetsanakis
Statstcal analyss usng matlab HY 439 Presented by: George Fortetsanaks Roadmap Probablty dstrbutons Statstcal estmaton Fttng data to probablty dstrbutons Contnuous dstrbutons Contnuous random varable X
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationXII.3 The EM (Expectation-Maximization) Algorithm
XII.3 The EM (Expectaton-Maxzaton) Algorth Toshnor Munaata 3/7/06 The EM algorth s a technque to deal wth varous types of ncoplete data or hdden varables. It can be appled to a wde range of learnng probles
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationClustering & Unsupervised Learning
Clusterng & Unsupervsed Learnng Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 2012 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationThe Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction
ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also
More informationThe Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD
he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s
More informationGaussian process classification: a message-passing viewpoint
Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP
More informationAssortment Optimization under MNL
Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.
More informationOverview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition
Overvew Hdden Marov Models and Gaussan Mxture Models Steve Renals and Peter Bell Automatc Speech Recognton ASR Lectures &5 8/3 January 3 HMMs and GMMs Key models and algorthms for HMM acoustc models Gaussans
More informationOn an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1
On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool
More informationPattern Classification
Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher
More informationOutline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline
Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number
More informationMaximal Margin Classifier
CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org
More informationClustering & (Ken Kreutz-Delgado) UCSD
Clusterng & Unsupervsed Learnng Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y ), fnd an approxmatng
More informationVQ widely used in coding speech, image, and video
at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng
More informationECE 534: Elements of Information Theory. Solutions to Midterm Exam (Spring 2006)
ECE 534: Elements of Informaton Theory Solutons to Mdterm Eam (Sprng 6) Problem [ pts.] A dscrete memoryless source has an alphabet of three letters,, =,, 3, wth probabltes.4,.4, and., respectvely. (a)
More informationChapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.
Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the
More informationMARKOV CHAIN AND HIDDEN MARKOV MODEL
MARKOV CHAIN AND HIDDEN MARKOV MODEL JIAN ZHANG JIANZHAN@STAT.PURDUE.EDU Markov chan and hdden Markov mode are probaby the smpest modes whch can be used to mode sequenta data,.e. data sampes whch are not
More informationComposite Hypotheses testing
Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informatione i is a random error
Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where + β + β e for,..., and are observable varables e s a random error How can an estmaton rule be constructed for the unknown
More information6 Supplementary Materials
6 Supplementar Materals 61 Proof of Theorem 31 Proof Let m Xt z 1:T : l m Xt X,z 1:t Wethenhave mxt z1:t ˆm HX Xt z 1:T mxt z1:t m HX Xt z 1:T + mxt z 1:T HX We consder each of the two terms n equaton
More informationFinding Dense Subgraphs in G(n, 1/2)
Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationOnline Classification: Perceptron and Winnow
E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng
More informationLimited Dependent Variables
Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011
Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected
More informationRetrieval Models: Language models
CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum
More informationp 1 c 2 + p 2 c 2 + p 3 c p m c 2
Where to put a faclty? Gven locatons p 1,..., p m n R n of m houses, want to choose a locaton c n R n for the fre staton. Want c to be as close as possble to all the house. We know how to measure dstance
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.
More informationLecture 21: Numerical methods for pricing American type derivatives
Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationHidden Markov Models
CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationHidden Markov Models
Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,
More informationSTATS 306B: Unsupervised Learning Spring Lecture 10 April 30
STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear
More informationExpected Value and Variance
MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or
More information