MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
|
|
- Gervase Cobb
- 5 years ago
- Views:
Transcription
1 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016
2 Recall 2/16 We are gven ndependent observatons (x (), z () ) wth mssng values z (). Let θ (0) be an ntal guess for θ.
3 Recall 2/16 We are gven ndependent observatons (x (), z () ) wth mssng values z (). Let θ (0) be an ntal guess for θ. Gven the current estmate θ () of θ, compute Q(θ θ () ) := E z x;θ () log p(x, z; θ) ) = E z () x () ;θ (log p(x (), z () ; θ) () (E step)
4 Recall 2/16 We are gven ndependent observatons (x (), z () ) wth mssng values z (). Let θ (0) be an ntal guess for θ. Gven the current estmate θ () of θ, compute Q(θ θ () ) := E z x;θ () log p(x, z; θ) ) = E z () x () ;θ (log p(x (), z () ; θ) () (E step) (In other words, we average the mssng values accordng to ther dstrbuton after observng the observed values.)
5 Recall 2/16 We are gven ndependent observatons (x (), z () ) wth mssng values z (). Let θ (0) be an ntal guess for θ. Gven the current estmate θ () of θ, compute Q(θ θ () ) := E z x;θ () log p(x, z; θ) ) = E z () x () ;θ (log p(x (), z () ; θ) () (E step) (In other words, we average the mssng values accordng to ther dstrbuton after observng the observed values.) We then optmze Q(θ θ () ) wth respect to θ: θ (+1) := argmax Q(θ θ () ) θ (M step). Theorem: The sequence θ () constructed by the EM algorthm satses: l(θ (+1) ) l(θ () ).
6 Convergence of the EM algorthm - Jensen's nequalty 3/16 Recall: f φ : R R s convex and X s a random varable, then φ(e(x)) E(φ(X)).
7 Convergence of the EM algorthm - Jensen's nequalty 3/16 Recall: f φ : R R s convex and X s a random varable, then φ(e(x)) E(φ(X)). In other words, f µ s a probablty measure on Ω, g : Ω R, and φ : Ω R s convex, then ( ) φ g dµ φ g dµ. Ω Ω
8 Convergence of the EM algorthm - Jensen's nequalty 3/16 Recall: f φ : R R s convex and X s a random varable, then φ(e(x)) E(φ(X)). In other words, f µ s a probablty measure on Ω, g : Ω R, and φ : Ω R s convex, then ( ) φ g dµ φ g dµ. Ω Ω Note: 1 The nequalty s reversed f φ s concave nstead of convex. 2 Equalty holds g s constant or φ(x) = ax + b.
9 Convergence of the EM algorthm - Jensen's nequalty 3/16 Recall: f φ : R R s convex and X s a random varable, then φ(e(x)) E(φ(X)). In other words, f µ s a probablty measure on Ω, g : Ω R, and φ : Ω R s convex, then ( ) φ g dµ φ g dµ. Ω Ω Note: 1 The nequalty s reversed f φ s concave nstead of convex. 2 Equalty holds g s constant or φ(x) = ax + b. Prevously, to deal wth mssng values, our goal was to maxmze log p(x () ; θ) = log p(x (), z () ; θ). z ()
10 Convergence of the EM algorthm - Jensen's nequalty Recall: f φ : R R s convex and X s a random varable, then φ(e(x)) E(φ(X)). In other words, f µ s a probablty measure on Ω, g : Ω R, and φ : Ω R s convex, then ( ) φ g dµ φ g dµ. Ω Ω Note: 1 The nequalty s reversed f φ s concave nstead of convex. 2 Equalty holds g s constant or φ(x) = ax + b. Prevously, to deal wth mssng values, our goal was to maxmze log p(x () ; θ) = log p(x (), z () ; θ). z () Let Q (z) be any probablty dstrbuton for z (),.e., 1 Q (z) 0 2 z Q (z) = 1. 3/16
11 Convergence of the EM algorthm (cont.) 4/16 Then, usng Jensen's nequalty: l(θ () ) = log p(x () ; θ) = = log p(x (), z () ; θ) z () log z () Q (z () ) p(x(), z () ; θ) Q (z () ) z () Q (z () ) log p(x (), z () ; θ) Q (z () )
12 Convergence of the EM algorthm (cont.) Then, usng Jensen's nequalty: l(θ () ) = log p(x () ; θ) = = log p(x (), z () ; θ) z () log z () Q (z () ) p(x(), z () ; θ) Q (z () ) z () Q (z () ) log p(x (), z () ; θ) Q (z () ) Thnkng of the nner sum as an expectaton wth respect to the dstrbuton Q, we have shown: log p(x () ; θ) E z () Q log p(x(), z () ; θ) Q (z (). ) How can we choose Q to get the best lower bound possble? 4/16
13 Convergence of the EM algorthm (cont.) 5/16 At every teraton of the EM algorthm, we choose Q to make the nequalty log p(x () ; θ) E z () Q log p(x(), z () ; θ) Q (z (). ) tght at our current estmate θ = θ ().
14 Convergence of the EM algorthm (cont.) 5/16 At every teraton of the EM algorthm, we choose Q to make the nequalty log p(x () ; θ) E z () Q log p(x(), z () ; θ) Q (z (). ) tght at our current estmate θ = θ (). By the equalty case n Jensen's nequalty, f log p(x () ; θ) = E z () Q log p(x(), z () ; θ) Q (z () ) p(x (), z () ; θ) Q (z () ) = c for all z (). In other words: Q (z () ) p(x (), z () ; θ).
15 Convergence of the EM algorthm (cont.) 5/16 At every teraton of the EM algorthm, we choose Q to make the nequalty log p(x () ; θ) E z () Q log p(x(), z () ; θ) Q (z (). ) tght at our current estmate θ = θ (). By the equalty case n Jensen's nequalty, f log p(x () ; θ) = E z () Q log p(x(), z () ; θ) Q (z () ) p(x (), z () ; θ) Q (z () ) = c for all z (). In other words: Q (z () ) p(x (), z () ; θ). Now, for Q to be a probablty dstrbuton, we need to choose: Q (z () ) = p(x (), z () ; θ) z () p(x(), z () ; θ) = p(z() x () ; θ).
16 Convergence of the EM algorthm (cont.) 6/16 The prevous calculaton motvates the E step E z x;θ () log p(x, z; θ) n the EM algorthm. We wll now show that l(θ (+1) ) l(θ () ).
17 Convergence of the EM algorthm (cont.) 6/16 The prevous calculaton motvates the E step E z x;θ () log p(x, z; θ) n the EM algorthm. We wll now show that l(θ (+1) ) l(θ () ). Wth our choce of Q (t) (z () ) p(x (), z () ; θ (t) ) at step t, we have: l(θ (t) ) = log p(x () ; θ (t) ) = log p(x (), z () ; θ (t) ) z () = = log z () Q (t) z () Q (t) (z () ) p(x(), z () ; θ (t) ) Q (t) (z () ) (z () ) log p(x(), z () ; θ (t) ) Q (t). (z () )
18 Convergence of the EM algorthm (cont.) 7/16 Now, l(θ (t+1) ) = Q (t+1) z () Q (t) z () Q (t) z () = l(θ (t) ). (z () ) log p(x(), z () ; θ (t+1) ) Q (t+1) (z () ) (z () ) log p(x(), z () ; θ (t+1) ) Q (t) (z () ) (z () ) log p(x(), z () ; θ (t) ) Q (t) (z () )
19 Convergence of the EM algorthm (cont.) Now, l(θ (t+1) ) = Q (t+1) z () Q (t) z () Q (t) z () = l(θ (t) ). (z () ) log p(x(), z () ; θ (t+1) ) Q (t+1) (z () ) (z () ) log p(x(), z () ; θ (t+1) ) Q (t) (z () ) (z () ) log p(x(), z () ; θ (t) ) Q (t) (z () ) Frst nequalty holds by Jensen's nequalty (our choce of Q gves equalty n Jensen, but the nequalty holds for any probablty dstrbuton). The second nequalty holds by denton of θ (t+1) : θ (+1) := argmax θ Q (t) z () (z () ) log p(x(), z () ; θ) Q (t). (z () ) 7/16
20 Example - Unvarate Gaussan 8/16 We consder a smple example to llustrate the EM algorthm. Suppose W N(µ, σ 2 ) wth µ R and σ > 0. Suppose w was observed for = 1,..., m and w s mssng for = m + 1,..., n. Let f(x) = 1 2πσ e (x µ)2 2σ 2 be the Gaussan densty.
21 Example - Unvarate Gaussan 8/16 We consder a smple example to llustrate the EM algorthm. Suppose W N(µ, σ 2 ) wth µ R and σ > 0. Suppose w was observed for = 1,..., m and w s mssng for = m + 1,..., n. Let f(x) = 1 2πσ e (x µ)2 2σ 2 be the Gaussan densty. The lkelhood functon for θ = (µ, σ 2 ) s gven by n n 1 L(θ) = f(w ) = e (w µ)2 2σ 2 2πσ = m 1 2πσ e (w µ)2 2σ n =m+1 1 e (w µ)2 2σ 2 2πσ
22 Example - Unvarate Gaussan 8/16 We consder a smple example to llustrate the EM algorthm. Suppose W N(µ, σ 2 ) wth µ R and σ > 0. Suppose w was observed for = 1,..., m and w s mssng for = m + 1,..., n. Let f(x) = 1 2πσ e (x µ)2 2σ 2 be the Gaussan densty. The lkelhood functon for θ = (µ, σ 2 ) s gven by n n 1 L(θ) = f(w ) = e (w µ)2 2σ 2 2πσ = m 1 2πσ e (w µ)2 2σ n =m+1 1 e (w µ)2 2σ 2 2πσ Margnalzng over the unobserved values, we get: m 1... L(θ) dw m+1... dw n = e (w µ)2 2σ. 2πσ
23 Example - Unvarate Gaussan (cont.) 9/16 Concluson: The MLE for (µ, σ 2 ) s the usual MLE for the observed values: ˆµ = 1 m w, ˆσ 2 = 1 m w 2 ˆµ 2. m m We wll now re-derve the same result usng the EM algorthm.
24 Example - Unvarate Gaussan (cont.) 9/16 Concluson: The MLE for (µ, σ 2 ) s the usual MLE for the observed values: ˆµ = 1 m w, ˆσ 2 = 1 m w 2 ˆµ 2. m m We wll now re-derve the same result usng the EM algorthm. The log-lkelhood functon s: l(θ) = [ 1 2 log σ2 1 2σ 2 (w µ) = n 2 log σ2 n 2 log 2π 1 2σ 2 [ nµ 2 + ] log 2π w 2 2µ Remark: The lkelhood s lnear n n w and n w2. ] w
25 Example - Unvarate Gaussan (cont.) 10/16 The E step of the EM algorthm at step t calculates: m E( w w obs ; θ (t) ) = w + (n m)µ (t). E( w 2 w obs ; θ (t) ) = m w 2 + (n m)[(µ (t) ) 2 + (σ 2 ) (t) ]. Note: Replacng n w and n w2 n l(θ) by the above expressons, the resultng functon has the same functonal form as the usual log-lkelhood.
26 Example - Unvarate Gaussan (cont.) 10/16 The E step of the EM algorthm at step t calculates: m E( w w obs ; θ (t) ) = w + (n m)µ (t). E( w 2 w obs ; θ (t) ) = m w 2 + (n m)[(µ (t) ) 2 + (σ 2 ) (t) ]. Note: Replacng n w and n w2 n l(θ) by the above expressons, the resultng functon has the same functonal form as the usual log-lkelhood. We conclude that µ (t+1) = 1 n (ˆσ 2 ) (t+1) = 1 n m w + m (n m) µ (t), n w 2 + n m n (µ(t) ) 2 + (σ 2 ) (t) ) (µ (t+1) ) 2.
27 Example - Unvarate Gaussan (cont.) 11/16 Usually, one would terate the followng system untl convergence: µ (t+1) = 1 m (n m) w + µ (t), n n (ˆσ 2 ) (t+1) = 1 m w 2 + n m n n (µ(t) ) 2 + (σ 2 ) (t) ) (µ (t+1) ) 2.
28 Example - Unvarate Gaussan (cont.) Usually, one would terate the followng system untl convergence: µ (t+1) = 1 m (n m) w + µ (t), n n (ˆσ 2 ) (t+1) = 1 m w 2 + n m n n (µ(t) ) 2 + (σ 2 ) (t) ) (µ (t+1) ) 2. In ths smple case, we can drectly compute the lmt by lettng t and solvng: ˆµ = 1 m (n m) w + ˆµ ˆµ = 1 m w, n n m and ˆσ 2 = 1 n m w 2 + n m n (ˆµ2 + ˆσ 2 ) ˆµ 2 ˆσ 2 = 1 m We obtan the same result as n the drect approach. m w 2 ˆµ 2. 11/16
29 Example - summary 12/16 Of course, one would not use the EM algorthm n the unvarate Gaussan case. The mportant pont here s that 1 The E step was equvalent to computng the condtonal expectaton of the sucent statstcs. 2 The M step was equvalent to a MLE problem wth complete data (often avalable n closed form).
30 Example - summary Of course, one would not use the EM algorthm n the unvarate Gaussan case. The mportant pont here s that 1 The E step was equvalent to computng the condtonal expectaton of the sucent statstcs. 2 The M step was equvalent to a MLE problem wth complete data (often avalable n closed form). The same phenomenon occurs when workng wth exponental famly dstrbutons: f(y θ) = b(y) exp(θ T T (x) a(θ)), where θ s a vector of parameters; T (y) s a vector of sucent statstcs; a(θ) s a normalzaton constant (the log partton functon). Includes: Gaussan, Bernoull, bnomal, multnomal, geometrc, exponental, Posson, Drchlet, gamma, ch-square, etc.. 12/16
31 Example - ttng mxture models 13/16 The EM algorthm s also useful to ttng models where there s no mssng data, but where some hdden parameters make the estmaton dcult. A mxture model s a probablty model wth densty f(x) = K p f (x), where p 0, K p = 1, and each f s a pdf.
32 Example - ttng mxture models 13/16 The EM algorthm s also useful to ttng models where there s no mssng data, but where some hdden parameters make the estmaton dcult. A mxture model s a probablty model wth densty f(x) = K p f (x), where p 0, K p = 1, and each f s a pdf. To sample from such a model: 1 Choose a category C at random accordng to the dstrbuton {p } K. 2 Choose X C = j f j. The f are often taken from the same parametrc famly (e.g. Gaussan), but don't have to.
33 Example - mxture of Gaussans 14/16 Consder a mxture of p-dmensonal Gaussan dstrbutons wth parameters (µ, Σ ) K, mxng probabltes (p ) K [0, 1], K p = 1.
34 Example - mxture of Gaussans 14/16 Consder a mxture of p-dmensonal Gaussan dstrbutons wth parameters (µ, Σ ) K, mxng probabltes (p ) K [0, 1], K p = 1. Consder a sample (x ) n Rp from ths model. The category from whch each sample was obtaned s unobserved. The parameters of the model are θ := {µ, Σ, p : = 1,..., K}. The densty for that model s K f(x) = p φ(x; µ, Σ ), where φ(x; µ, Σ) denotes the Gaussan densty wth parameters (µ, Σ).
35 Example - mxture of Gaussans Consder a mxture of p-dmensonal Gaussan dstrbutons wth parameters (µ, Σ ) K, mxng probabltes (p ) K [0, 1], K p = 1. Consder a sample (x ) n Rp from ths model. The category from whch each sample was obtaned s unobserved. The parameters of the model are θ := {µ, Σ, p : = 1,..., K}. The densty for that model s K f(x) = p φ(x; µ, Σ ), where φ(x; µ, Σ) denotes the Gaussan densty wth parameters (µ, Σ). The log-lkelhood functon s K l(θ) = log p j φ(x ; µ j, Σ j ) j=1 Numercally optmzng l(θ) s known to be slow and unstable. 14/16
36 15/16 Example - mxture of Gaussans (cont.) The EM algorthm approach s smpler and faster. Suppose our observatons are (x, c ) where c s the (unobserved) category from whch x was drawn. The log-lkelhood functon can be wrtten as l(θ) = K log 1 {C =j}p j φ(x ; µ j, Σ j ). j=1 Usng Bayes' rule: π j := P (C = j X = x ) = = P (X = x C = j)p (C = j) K k=1 P (X = x ) C = k)p (C = k) p j φ(x ; µ j, Σ j ) K k=1 p kφ(x ; µ k, Σ k ).
37 Example - mxture of Gaussans (cont.) The EM algorthm for a mxture of Gaussans: E step: Compute the membershp probabltes (or responsabltes')' usng the current estmate of the parameters: π (t) j = M step: Update parameters: µ (t+1) j = 1 π (t) j N x j where Σ (t+1) j p (t+1) j = 1 N j = N k n, p (t) j φ(x ; µ (t) j, Σ(t) j ) K k=1 p(t) k φ(x ; µ (t) k, Σ(t) k ). π (t) j (x µ (t+1) )(x µ (t+1) ) T N k = π (t) k. 16/16
EM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationConjugacy and the Exponential Family
CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the
More informationMaximum Likelihood Estimation
Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?
More informationParametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010
Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton
More informationCourse 395: Machine Learning - Lectures
Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng
More informationThe EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X
The EM Algorthm (Dempster, Lard, Rubn 1977 The mssng data or ncomplete data settng: An Observed Data Lkelhood (ODL that s a mxture or ntegral of Complete Data Lkelhoods (CDL. (1a ODL(;Y = [Y;] = [Y,][
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationThe Expectation-Maximization Algorithm
The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for
More informationGoodness of fit and Wilks theorem
DRAFT 0.0 Glen Cowan 3 June, 2013 Goodness of ft and Wlks theorem Suppose we model data y wth a lkelhood L(µ) that depends on a set of N parameters µ = (µ 1,...,µ N ). Defne the statstc t µ ln L(µ) L(ˆµ),
More informationSemi-Supervised Learning
Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationThe Basic Idea of EM
The Basc Idea of EM Janxn Wu LAMDA Group Natonal Key Lab for Novel Software Technology Nanjng Unversty, Chna wujx2001@gmal.com June 7, 2017 Contents 1 Introducton 1 2 GMM: A workng example 2 2.1 Gaussan
More informationLecture 4: Universal Hash Functions/Streaming Cont d
CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationComputing MLE Bias Empirically
Computng MLE Bas Emprcally Kar Wa Lm Australan atonal Unversty January 3, 27 Abstract Ths note studes the bas arses from the MLE estmate of the rate parameter and the mean parameter of an exponental dstrbuton.
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors
Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationAn Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation
An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads
More informationHidden Markov Models
Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,
More informationxp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ
CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationUsing T.O.M to Estimate Parameter of distributions that have not Single Exponential Family
IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran
More informationGaussian Mixture Models
Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous
More informationSDMML HT MSc Problem Sheet 4
SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be
More informationProbabilistic Classification: Bayes Classifiers. Lecture 6:
Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.
More information8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore
8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø
More informationANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)
Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of
More informationComputation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models
Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,
More informationAdvanced Statistical Methods: Beyond Linear Regression
Advanced Statstcal Methods: Beyond Lnear Regresson John R. Stevens Utah State Unversty Notes 2. Statstcal Methods I Mathematcs Educators Workshop 28 March 2009 1 http://www.stat.usu.edu/~rstevens/pcm 2
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More information9 : Learning Partially Observed GM : EM Algorithm
10-708: Probablstc Graphcal Models 10-708, Sprng 2012 9 : Learnng Partally Observed GM : EM Algorthm Lecturer: Erc P. Xng Scrbes: Mrnmaya Sachan, Phan Gadde, Vswanathan Srpradha 1 Introducton So far n
More informationIntegrals and Invariants of Euler-Lagrange Equations
Lecture 16 Integrals and Invarants of Euler-Lagrange Equatons ME 256 at the Indan Insttute of Scence, Bengaluru Varatonal Methods and Structural Optmzaton G. K. Ananthasuresh Professor, Mechancal Engneerng,
More informationTAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES
TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES SVANTE JANSON Abstract. We gve explct bounds for the tal probabltes for sums of ndependent geometrc or exponental varables, possbly wth dfferent
More informationGaussian process classification: a message-passing viewpoint
Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP
More informationFirst Year Examination Department of Statistics, University of Florida
Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve
More informationSTATS 306B: Unsupervised Learning Spring Lecture 10 April 30
STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.
More informationMachine learning: Density estimation
CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of
More informationRockefeller College University at Albany
Rockefeller College Unverst at Alban PAD 705 Handout: Maxmum Lkelhood Estmaton Orgnal b Davd A. Wse John F. Kenned School of Government, Harvard Unverst Modfcatons b R. Karl Rethemeer Up to ths pont n
More informationRandomness and Computation
Randomness and Computaton or, Randomzed Algorthms Mary Cryan School of Informatcs Unversty of Ednburgh RC 208/9) Lecture 0 slde Balls n Bns m balls, n bns, and balls thrown unformly at random nto bns usually
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationThe Expectation-Maximisation Algorithm
Chapter 4 The Expectaton-Maxmsaton Algorthm 4. The EM algorthm - a method for maxmsng the lkelhood Let us suppose that we observe Y {Y } n. The jont densty of Y s f(y ; θ 0), and θ 0 s an unknown parameter.
More informationSpace of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics
/7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space
More informationHidden Markov Models
CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte
More informationLecture 10: Expectation-Maximization Algorithm
ECE 645: Estmaton Theory Sprng 2015 Instructor: Prof Stanley H Chan Lecture 10: Expectaton-Maxmzaton Algorthm (LaTeX prepared by Shaobo Fang) May 4, 2015 Ths lecture note s based on ECE 645 (Sprng 2015)
More informationOn an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1
On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool
More informationA linear imaging system with white additive Gaussian noise on the observed data is modeled as follows:
Supplementary Note Mathematcal bacground A lnear magng system wth whte addtve Gaussan nose on the observed data s modeled as follows: X = R ϕ V + G, () where X R are the expermental, two-dmensonal proecton
More informationRobust mixture modeling using multivariate skew t distributions
Robust mxture modelng usng multvarate skew t dstrbutons Tsung-I Ln Department of Appled Mathematcs and Insttute of Statstcs Natonal Chung Hsng Unversty, Tawan August, 1 T.I. Ln (NCHU Natonal Chung Hsng
More informationPredictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore
Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.
More informationStat 543 Exam 2 Spring 2016
Stat 543 Exam 2 Sprng 206 I have nether gven nor receved unauthorzed assstance on ths exam. Name Sgned Date Name Prnted Ths Exam conssts of questons. Do at least 0 of the parts of the man exam. I wll score
More informationOverview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition
Overvew Hdden Marov Models and Gaussan Mxture Models Steve Renals and Peter Bell Automatc Speech Recognton ASR Lectures &5 8/3 January 3 HMMs and GMMs Key models and algorthms for HMM acoustc models Gaussans
More informationLecture Nov
Lecture 18 Nov 07 2008 Revew Clusterng Groupng smlar obects nto clusters Herarchcal clusterng Agglomeratve approach (HAC: teratvely merge smlar clusters Dfferent lnkage algorthms for computng dstances
More informationComposite Hypotheses testing
Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationProbability and Random Variable Primer
B. Maddah ENMG 622 Smulaton 2/22/ Probablty and Random Varable Prmer Sample space and Events Suppose that an eperment wth an uncertan outcome s performed (e.g., rollng a de). Whle the outcome of the eperment
More information8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF
10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the
More informationClustering with Gaussian Mixtures
Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your
More informationStat 543 Exam 2 Spring 2016
Stat 543 Exam 2 Sprng 2016 I have nether gven nor receved unauthorzed assstance on ths exam. Name Sgned Date Name Prnted Ths Exam conssts of 11 questons. Do at least 10 of the 11 parts of the man exam.
More informationMixture o f of Gaussian Gaussian clustering Nov
Mture of Gaussan clusterng Nov 11 2009 Soft vs hard lusterng Kmeans performs Hard clusterng: Data pont s determnstcally assgned to one and only one cluster But n realty clusters may overlap Soft-clusterng:
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More informationExpectation Maximization Mixture Models HMMs
-755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood
More informationMATH 281A: Homework #6
MATH 28A: Homework #6 Jongha Ryu Due date: November 8, 206 Problem. (Problem 2..2. Soluton. If X,..., X n Bern(p, then T = X s a complete suffcent statstc. Our target s g(p = p, and the nave guess suggested
More informationLearning from Data 1 Naive Bayes
Learnng from Data 1 Nave Bayes Davd Barber dbarber@anc.ed.ac.uk course page : http://anc.ed.ac.uk/ dbarber/lfd1/lfd1.html c Davd Barber 2001, 2002 1 Learnng from Data 1 : c Davd Barber 2001,2002 2 1 Why
More informationLinear Regression Analysis: Terminology and Notation
ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented
More informationA New Refinement of Jacobi Method for Solution of Linear System Equations AX=b
Int J Contemp Math Scences, Vol 3, 28, no 17, 819-827 A New Refnement of Jacob Method for Soluton of Lnear System Equatons AX=b F Naem Dafchah Department of Mathematcs, Faculty of Scences Unversty of Gulan,
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30
More informationAPPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14
APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce
More informationDifferentiating Gaussian Processes
Dfferentatng Gaussan Processes Andrew McHutchon Aprl 17, 013 1 Frst Order Dervatve of the Posteror Mean The posteror mean of a GP s gven by, f = x, X KX, X 1 y x, X α 1 Only the x, X term depends on the
More informationStatistics and Probability Theory in Civil, Surveying and Environmental Engineering
Statstcs and Probablty Theory n Cvl, Surveyng and Envronmental Engneerng Pro. Dr. Mchael Havbro Faber ETH Zurch, Swtzerland Contents o Todays Lecture Overvew o Uncertanty Modelng Random Varables - propertes
More information1 Motivation and Introduction
Instructor: Dr. Volkan Cevher EXPECTATION PROPAGATION September 30, 2008 Rce Unversty STAT 63 / ELEC 633: Graphcal Models Scrbes: Ahmad Beram Andrew Waters Matthew Nokleby Index terms: Approxmate nference,
More informationBézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0
Bézer curves Mchael S. Floater September 1, 215 These notes provde an ntroducton to Bézer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of
More informationComments on Detecting Outliers in Gamma Distribution by M. Jabbari Nooghabi et al. (2010)
Comments on Detectng Outlers n Gamma Dstrbuton by M. Jabbar Nooghab et al. (21) M. Magdalena Lucn Alejandro C. Frery September 17, 215 arxv:159.55v1 [stat.co] 16 Sep 215 Ths note shows that the results
More informationp(z) = 1 a e z/a 1(z 0) yi a i x (1/a) exp y i a i x a i=1 n i=1 (y i a i x) inf 1 (y Ax) inf Ax y (1 ν) y if A (1 ν) = 0 otherwise
Dustn Lennon Math 582 Convex Optmzaton Problems from Boy, Chapter 7 Problem 7.1 Solve the MLE problem when the nose s exponentally strbute wth ensty p(z = 1 a e z/a 1(z 0 The MLE s gven by the followng:
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationProbability Theory (revisited)
Probablty Theory (revsted) Summary Probablty v.s. plausblty Random varables Smulaton of Random Experments Challenge The alarm of a shop rang. Soon afterwards, a man was seen runnng n the street, persecuted
More informationMatrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD
Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo
More informationExpected Value and Variance
MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or
More informationApplied Stochastic Processes
STAT455/855 Fall 23 Appled Stochastc Processes Fnal Exam, Bref Solutons 1. (15 marks) (a) (7 marks) The dstrbuton of Y s gven by ( ) ( ) y 2 1 5 P (Y y) for y 2, 3,... The above follows because each of
More informationAppendix B. Criterion of Riemann-Stieltjes Integrability
Appendx B. Crteron of Remann-Steltes Integrablty Ths note s complementary to [R, Ch. 6] and [T, Sec. 3.5]. The man result of ths note s Theorem B.3, whch provdes the necessary and suffcent condtons for
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationSTAT 3008 Applied Regression Analysis
STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,
More informationRandom Partitions of Samples
Random Parttons of Samples Klaus Th. Hess Insttut für Mathematsche Stochastk Technsche Unverstät Dresden Abstract In the present paper we construct a decomposton of a sample nto a fnte number of subsamples
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces
More informationLecture 4: September 12
36-755: Advanced Statstcal Theory Fall 016 Lecture 4: September 1 Lecturer: Alessandro Rnaldo Scrbe: Xao Hu Ta Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer: These notes have not been
More informationSupplementary material: Margin based PU Learning. Matrix Concentration Inequalities
Supplementary materal: Margn based PU Learnng We gve the complete proofs of Theorem and n Secton We frst ntroduce the well-known concentraton nequalty, so the covarance estmator can be bounded Then we
More informationBeyond Zudilin s Conjectured q-analog of Schmidt s problem
Beyond Zudln s Conectured q-analog of Schmdt s problem Thotsaporn Ae Thanatpanonda thotsaporn@gmalcom Mathematcs Subect Classfcaton: 11B65 33B99 Abstract Usng the methodology of (rgorous expermental mathematcs
More informationMarkov Chain Monte Carlo Lecture 6
where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationMulti-Conditional Learning for Joint Probability Models with Latent Variables
Mult-Condtonal Learnng for Jont Probablty Models wth Latent Varables Chrs Pal, Xueru Wang, Mchael Kelm and Andrew McCallum Department of Computer Scence Unversty of Massachusetts Amherst Amherst, MA USA
More informationChapter 12. Ordinary Differential Equation Boundary Value (BV) Problems
Chapter. Ordnar Dfferental Equaton Boundar Value (BV) Problems In ths chapter we wll learn how to solve ODE boundar value problem. BV ODE s usuall gven wth x beng the ndependent space varable. p( x) q(
More informationTHE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY. William A. Pearlman. References: S. Arimoto - IEEE Trans. Inform. Thy., Jan.
THE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY Wllam A. Pearlman 2002 References: S. Armoto - IEEE Trans. Inform. Thy., Jan. 1972 R. Blahut - IEEE Trans. Inform. Thy., July 1972 Recall
More informationExplaining the Stein Paradox
Explanng the Sten Paradox Kwong Hu Yung 1999/06/10 Abstract Ths report offers several ratonale for the Sten paradox. Sectons 1 and defnes the multvarate normal mean estmaton problem and ntroduces Sten
More informationA Hybrid Variational Iteration Method for Blasius Equation
Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 1 (June 2015), pp. 223-229 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) A Hybrd Varatonal Iteraton Method
More informationA new Approach for Solving Linear Ordinary Differential Equations
, ISSN 974-57X (Onlne), ISSN 974-5718 (Prnt), Vol. ; Issue No. 1; Year 14, Copyrght 13-14 by CESER PUBLICATIONS A new Approach for Solvng Lnear Ordnary Dfferental Equatons Fawz Abdelwahd Department of
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and
More information