Optimization Methods II. EM algorithms.
|
|
- Dora Phelps
- 5 years ago
- Views:
Transcription
1 Aula 7. Optimization Methods II. 0 Optimization Methods II. EM algorithms. Anatoli Iambartsev IME-USP
2 Aula 7. Optimization Methods II. 1 [RC] Missing-data models. Demarginalization. The term EM algorithms has been around for a long time by [DLR]. Consider the case where the density of the observations can be expresses as g θ (x) = f θ (x, z)dz, g θ (x) f θ (x, z). This representation occurs in many statistical settings, including censoring models and mixtures and latent variable models (tobit, probit, arch, stochastic volatility, ect.).
3 Aula 7. Optimization Methods II. 2 [RC] Example 5.12 The mixture model of Example 5.2 (see previous lecture), 0.25N(µ 1, 1) N(µ 2, 1), can be expressed as a missing-data model even though the (observed) likelihood can be computed in a manageable time. Indeed, if we introduce a vector z = (z 1,..., z n ) {1, 2} n in addition to the sample x = (x 1,..., x n ) such that P θ (Z i = 1) = 1 P θ (Z i = 2) = 0.25, X i Z i = z N(µ z, 1). we recover the mixture model from the Example 5.2 as the marginal distribution of X i. The (observed) likelihood is then obtained as E(H(x, z)) for { 1 H(x, z) 4 exp (x } i µ 1 ) 2 { exp (x } i µ 2 ) 2. 2 i:z i =1 i:z i =2
4 Aula 7. Optimization Methods II. 3 [RC] Example 5.12 The (observed) likelihood is then obtained as E(H(x, z)) for } } H(x, z) Indeed, i:z i =1 g µ1,µ 2 (x) = { 1 4 exp = = (x i µ 1 ) 2 2 i:z i =2 f µ1,µ 2 (x, z)dz = z {1,2} n i:z i =1 n ( 1 4 i=1 1 4 e (x i µ 1 )2 2 2π e (x i µ 1 ) π 4 { 3 4 exp (x i µ 2 ) 2 2 H(x, z) ( 2π) n z {1,2} n i:z i =2 3 4 e (x i µ 2 )2 2 2π ). e (x i µ 2 )2 2 2π :
5 Aula 7. Optimization Methods II. 4 [RC] Example 5.13 Censored data may come from experiments where some potential observations are replaced with a lower bound because they take too long to observe. Suppose that we observe Y 1,..., Y m, iid, from f(y θ) and that the (n m) remaining (Y m+1,..., Y n ) are censored at the threshold a. The corresponding likelihood function is then L(θ y) = ( 1 F (a θ) ) n m m i=1 f(y i θ), where F is the cdf associated with f and y = (y 1,..., y m ).
6 Aula 7. Optimization Methods II. 5 [RC] Example 5.13 L(θ y) = ( 1 F (a θ) ) n m m i=1 f(y i θ). If we had observed the last n m values, say z = (z m+1,..., z n ), with z i a(i = m + 1,..., n), we could have constructed the (complete data) likelihood L c (θ y, z) = m i=1 f(y i θ) n i=m+1 f(z i θ). Note that L(θ y) = E ( L c (θ y, Z) ) = L c (θ y, z)f(z y, θ)dz, where f(z y, θ) is the density of the missing data conditional on the observed data, namely the product of the f(z i θ)/(1 F (a θ)) s; i.e., f(z θ) restricted to (a, + ).
7 Aula 7. Optimization Methods II. 6 Main Idea of EM algorithms Demarginalization: g θ (x) f θ (x, z), g θ (x) = f θ (x, z)dz. A values from z can be generated by the conditional distribution k θ (z x) = f θ(x, z) g θ (x). Take a logarithm log g θ (x) = log f θ (x, z) log k θ (z x). In notations of likelihood function log L(θ x) = log L c (θ x, z) log k θ (z x), where L c stands for complete likelihood function.
8 Aula 7. Optimization Methods II. 7 Main Idea of EM algorithms log L(θ x) = log L c (θ x, z) log k θ (z x), Let us fix a value θ 0 and calculate the expectation according to the distribution k θ0 (z x): log L(θ x) = E k,θ0 log L c (θ x, z) E k,θ0 log k θ (z x) =: Q(θ θ 0, x) H(θ θ 0, x). Theorem. Let θ 1 maximizes the Q, i.e., Then Q(θ 1 θ 0, x) = max θ Q(θ θ 0, x). log L(θ 1 x) log L(θ 0 x).
9 Aula 7. Optimization Methods II. 8 Main Idea of EM algorithms. Proof. Q(θ 1 θ 0, x) = max θ Proof. Q(θ θ 0, x) log L(θ 1 x) log L(θ 0 x). log L(θ 1 x) log L(θ 0 x) ( Q(θ1 θ 0, x) Q(θ 0 θ 0, x) ) ( H(θ 1 θ 0, x) H(θ 0 θ 0, x) ) Note that by definition of θ 1 Q(θ 1 θ 0, x) Q(θ 0 θ 0, x) 0.
10 Aula 7. Optimization Methods II. 9 Main Idea of EM algorithms. Proof. and log L(θ 1 x) log L(θ 0 x) ( Q(θ1 θ 0, x) Q(θ 0 θ 0, x) ) ( H(θ 1 θ 0, x) H(θ 0 θ 0, x) ) H(θ 1 θ 0, x) H(θ 0 θ 0, x) = E k,θ0 log k θ1 (Z x) E k,θ0 log k θ0 (Z x) = E k,θ0 log k θ 1 (Z x) k θ0 (Z x) log E k θ1 (Z x) k,θ 0 k θ0 (Z x) = log 1 = 0, where Jensen inequality was used E log ξ log Eξ.
11 Aula 7. Optimization Methods II. 10 Main Idea of EM algorithms. Proof. log L(θ 1 x) log L(θ 0 x) ( Q(θ1 θ 0, x) Q(θ 0 θ 0, x) ) ( H(θ 1 θ 0, x) H(θ 0 θ 0, x) ) We have thus Q(θ 1 θ 0, x) Q(θ 0 θ 0, x) 0, H(θ 1 θ 0, x) H(θ 0 θ 0, x) 0, log L(θ 1 x) log L(θ 0 x) 0. This completes the proof of the theorem.
12 Aula 7. Optimization Methods II. 11 Main Idea of EM algorithms. Each iteration EM algorithm maximizes a function Q. Let θ t be a sequence obtained recurcively Q(θ t+1 θ t, x) = max θ Q(θ θ t, x). This recurrent scheme consists on two steps expectation and maximization, that gives the name for the scheme: EM algorithm.
13 Aula 7. Optimization Methods II. 12 EM algorithm. Choose the initial parameter θ 0 and repeat: E-step. Calculate the expectation Q(θ θ t, x) = E k,θt log L c (θ x, Z) with respect to the distribution k θt (z x). M-step. Maximize Q(θ θ t, x) on θ and determine the next value θ t+1 = arg max θ Q(θ θ t, x), define t = t + 1, return to the E-step
14 Aula 7. Optimization Methods II. 13 [RC]. Example 5.14 (Cont. of Example 5.13) Let again Y 1,..., Y m are iid com density f(y θ) and others Y m+1,..., Y n are censured at the level a. The likelihood function L(θ y) = ( 1 F (a θ) ) n m m i=1 f(y i θ), where F (a θ) = P(Y i a). If we had observed the last n m values, say z = (z m+1,..., z n ), with z i a(i = m + 1,..., n), we could have constructed the (complete data) likelihood L c (θ y, z) = m i=1 f(y i θ) n i=m+1 f(z i θ). and k θ (z y) = n m i=1 f(z i θ) 1 F (a θ).
15 Aula 7. Optimization Methods II. 14 [RC]. Example 5.14 (Cont. of Example 5.13) Suppose that f(y θ) corresponds to the N(θ, 1) distribution, the complete-data likelihood is m n L c (θ y, z) e (y i θ) 2 /2 e (z i θ) 2 /2, i=1 i=m+1 resulting in the expected complete-data log-likelihood Q(θ θ 0, y) = 1 m (y i θ) 2 1 n ( E k,θ0 Zi θ) 2), 2 2 i=1 i=m+1 where the missing observations Z i are distributed from a normal N(θ, 1) distribution truncated in a. Doing the M-step (i.e., differentiating the function Q(θ θ 0, y) in θ) and setting it equal to 0 then leads to the EM update ˆθ = mȳ + (n m)e k,θ 0 (Z 1 ). n
16 Aula 7. Optimization Methods II. 15 [RC]. Example 5.14 (Cont. of Example 5.13) Doing the M-step... the EM update ˆθ = mȳ + (n m)e k,θ 0 (Z 1 ). n Since E k,θ0 (Z 1 ) = θ + φ(a θ), where φ and Φ are the 1 Φ(a θ) normal pdf and cdf, respectively, the EM sequence is θ t+1 = m n ȳ + n m ( θ t + φ(a θ ) t). n 1 Φ(a θ t )
17 Aula 7. Optimization Methods II. 16 Principle of missing information (informally) log L(θ x) = log L c (θ x, z) log k θ (z x) 2 log L(θ x) = 2 log L c (θ x, z) + 2 log k θ (z x) θ 2 θ 2 θ 2 Observed information = Complete information Missing information Informally about a convergence rate of EM algorithms: if a proportion of missing information increases with iterations, then a rate of convergence of an algorithm decreases.
18 Aula 7. Optimization Methods II. 17 EM algorithms for exponential family models. It is more easy to implement an algorithm when a complete data (z, x) has the exponential family type of distribution in a canonic form: p(z, x θ) = b(z, x) exp(θt s(z, x)) a(θ) Let y = (z, x) be a vector of complete data. We have log p(y θ) = log b(y) + θ T s(y) log a(θ) 1 a(θ) log p(y θ) = s(y) θ a(θ) θ Remember that a(θ) = b(y) exp(θ T s(y))dy
19 Aula 7. Optimization Methods II. 18 EM algorithms for exponential family models. θ log p(y θ) = s(y) 1 a(θ) a(θ) θ, a(θ) = b(y) exp(θ T s(y))dy we have log a(θ) θ = = = 1 a(θ) 1 a(θ) a(θ) θ = 1 a(θ) b(y)s(y) exp(θ T s(y))dy b(y) exp(θt s(y)) dy θ s(y) b(y) exp(θt s(y)) dy = E(s(y) θ) a(θ) Thus, log p(y θ) = 0 s(y) = E(s(y) θ) θ
20 Aula 7. Optimization Methods II. 19 EM algorithms for exponential family models. Implementation of EM algorithm: E-step Q(θ θ t ) = = log p(z, x θ)p(z θ t, x)dz b(z, x)p(z θ t, x)dz + θ T s(z, x)p(z θ t, x)dz log a(θ) Note that the first term will not participate in M-step. M-step: we obtain extreme point Q(θ θ t ) θ = s(z, x)p(z θ t, x)dz 1 a(θ) a(θ) θ = E(s(z, x) θ t, x) E(s(z, x) θ) Remember that E(s(z, x) θ) = 1 a(θ) a(θ) θ
21 Aula 7. Optimization Methods II. 20 EM algorithms for exponential family models. Thus the maximization of Q(θ θ t, x) in M-step is equivalent to solve the following equation E(s(z, x) θ t, x) = E(s(z, x) θ) If the solution exists, then it is unique.
22 Aula 7. Optimization Methods II. 21 Monte Carlo for E-step. Given θ t we need to calculate Q(θ θ t, x) = E k,θt log L c (θ Z, x). When it is difficult to calculate explicitly we can calculate approximately using Monte Carlo: 1. generate z 1,..., z m according k θ (z x); 2. calculate Q(θ θ t ) = 1 m m i=1 log Lc (θ z i, x); during M-step one maximizes Q by θ in order to obtain θ t+1.
23 Aula 7. Optimization Methods II. 22 EM algorithm. [H] Hunter, D.R. On the Geometry of EM algorithms.: this paper demonstrates how the geometry of EM algorithms can help explain how their rate of convergence is related to the proportion of missing data. In footnote [H] wrote: In a footnote, [DLR] refer to the comment of a referee, who noted that the use of the word algorithm may be criticized since EM is not, strictly speaking, an algorithm. However, EM is a recipe for creating algorithms, and thus we consider the set of EM algorithms to consist of all algorithms baked according to the EM recipe.
24 Aula 7. Optimization Methods II. 23 References. [DLR] Dempster, A.P., Laird, N.M., and Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm, J.Roy. Statist. Soc. Ser. B, 39, 1-38, [H] David R. Hunter. On the Geometry of EM algorithms. Technical Report 0303, Dep. of Stat., Penn State University. February, [RC ] Cristian P. Robert and George Casella. Introducing Monte Carlo Methods with R. Series Use R!. Springer
EM for ML Estimation
Overview EM for ML Estimation An algorithm for Maximum Likelihood (ML) Estimation from incomplete data (Dempster, Laird, and Rubin, 1977) 1. Formulate complete data so that complete-data ML estimation
More informationEM Algorithm. Expectation-maximization (EM) algorithm.
EM Algorithm Outline: Expectation-maximization (EM) algorithm. Examples. Reading: A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc.,
More informationIntroduction An approximated EM algorithm Simulation studies Discussion
1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationMarkov Chain Monte Carlo. Simulated Annealing.
Aula 10. Simulated Annealing. 0 Markov Chain Monte Carlo. Simulated Annealing. Anatoli Iambartsev IME-USP Aula 10. Simulated Annealing. 1 [RC] Stochastic search. General iterative formula for optimizing
More informationChapter 3 : Likelihood function and inference
Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM
More information1 Expectation Maximization
Introduction Expectation-Maximization Bibliographical notes 1 Expectation Maximization Daniel Khashabi 1 khashab2@illinois.edu 1.1 Introduction Consider the problem of parameter learning by maximizing
More informationEstimating the parameters of hidden binomial trials by the EM algorithm
Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationComputing the MLE and the EM Algorithm
ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Computing the MLE and the EM Algorithm If X p(x θ), θ Θ, then the MLE is the solution to the equations logp(x θ) θ 0. Sometimes these equations
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationMH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution
MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationExpectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )
Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationCSC411 Fall 2018 Homework 5
Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to
More informationMonte Carlo Integration I [RC] Chapter 3
Aula 3. Monte Carlo Integration I 0 Monte Carlo Integration I [RC] Chapter 3 Anatoli Iambartsev IME-USP Aula 3. Monte Carlo Integration I 1 There is no exact definition of the Monte Carlo methods. In the
More informationA Quantile Implementation of the EM Algorithm and Applications to Parameter Estimation with Interval Data
A Quantile Implementation of the EM Algorithm and Applications to Parameter Estimation with Interval Data Chanseok Park The author is with the Department of Mathematical Sciences, Clemson University, Clemson,
More informationFinite Singular Multivariate Gaussian Mixture
21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate
More informationNew mixture models and algorithms in the mixtools package
New mixture models and algorithms in the mixtools package Didier Chauveau MAPMO - UMR 6628 - Université d Orléans 1ères Rencontres BoRdeaux Juillet 2012 Outline 1 Mixture models and EM algorithm-ology
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationf X (y, z; θ, σ 2 ) = 1 2 (2πσ2 ) 1 2 exp( (y θz) 2 /2σ 2 ) l c,n (θ, σ 2 ) = i log f(y i, Z i ; θ, σ 2 ) (Y i θz i ) 2 /2σ 2
Chapter 7: EM algorithm in exponential families: JAW 4.30-32 7.1 (i) The EM Algorithm finds MLE s in problems with latent variables (sometimes called missing data ): things you wish you could observe,
More informationOn the Geometry of EM algorithms
On the Geometry of EM algorithms David R. Hunter Technical report no. 0303 Department of Statistics Penn State University University Park, PA 16802-2111 email: dhunter@stat.psu.edu phone: (814) 863-0979
More informationParameter Estimation of Power Lomax Distribution Based on Type-II Progressively Hybrid Censoring Scheme
Applied Mathematical Sciences, Vol. 12, 2018, no. 18, 879-891 HIKARI Ltd, www.m-hikari.com https://doi.org/10.12988/ams.2018.8691 Parameter Estimation of Power Lomax Distribution Based on Type-II Progressively
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationMCMC Simulated Annealing Exercises.
Aula 10. Simulated Annealing. 0 MCMC Simulated Annealing Exercises. Anatoli Iambartsev IME-USP Aula 10. Simulated Annealing. 1 [Wiki] Salesman problem. The travelling salesman problem (TSP), or, in recent
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationExpectation maximization tutorial
Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed
More informationIssues on Fitting Univariate and Multivariate Hyperbolic Distributions
Issues on Fitting Univariate and Multivariate Hyperbolic Distributions Thomas Tran PhD student Department of Statistics University of Auckland thtam@stat.auckland.ac.nz Fitting Hyperbolic Distributions
More informationLast lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton
EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares
More informationLatent Variable Models
Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationLecture 6: April 19, 2002
EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin
More informationTheory of Statistical Tests
Ch 9. Theory of Statistical Tests 9.1 Certain Best Tests How to construct good testing. For simple hypothesis H 0 : θ = θ, H 1 : θ = θ, Page 1 of 100 where Θ = {θ, θ } 1. Define the best test for H 0 H
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationFitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation
Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center, the Netherlands d.rizopoulos@erasmusmc.nl
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationThe Expectation Maximization Algorithm
The Expectation Maximization Algorithm Frank Dellaert College of Computing, Georgia Institute of Technology Technical Report number GIT-GVU-- February Abstract This note represents my attempt at explaining
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationPARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS
Statistica Sinica 15(2005), 831-840 PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Florin Vaida University of California at San Diego Abstract: It is well known that the likelihood sequence of the EM algorithm
More informationStat 451 Lecture Notes Monte Carlo Integration
Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:
More informationStatistical Estimation
Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from
More informationLikelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University
Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate
More informationHypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes
Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis
More informationEM for Spherical Gaussians
EM for Spherical Gaussians Karthekeyan Chandrasekaran Hassan Kingravi December 4, 2007 1 Introduction In this project, we examine two aspects of the behavior of the EM algorithm for mixtures of spherical
More informationAn introduction to Variational calculus in Machine Learning
n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area
More informationCensoring mechanisms
Censoring mechanisms Patrick Breheny September 3 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Fixed vs. random censoring In the previous lecture, we derived the contribution to the likelihood
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet
Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationStat 451 Lecture Notes Numerical Integration
Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29
More informationComputational statistics
Computational statistics EM algorithm Thierry Denœux February-March 2017 Thierry Denœux Computational statistics February-March 2017 1 / 72 EM Algorithm An iterative optimization strategy motivated by
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationA Note on the Expectation-Maximization (EM) Algorithm
A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization
More informationLecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)
Lectures on Machine Learning (Fall 2017) Hyeong In Choi Seoul National University Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Topics to be covered:
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:
More informationHmms with variable dimension structures and extensions
Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating
More informationU-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models
U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models Jaemo Sung 1, Sung-Yang Bang 1, Seungjin Choi 1, and Zoubin Ghahramani 2 1 Department of Computer Science, POSTECH,
More informationSpeech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models.
Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.com This Lecture Expectation-Maximization (EM)
More informationExpectation Maximization
Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could
More informationParameter Estimation
Parameter Estimation Chapters 13-15 Stat 477 - Loss Models Chapters 13-15 (Stat 477) Parameter Estimation Brian Hartman - BYU 1 / 23 Methods for parameter estimation Methods for parameter estimation Methods
More informationLecture 1: Supervised Learning
Lecture 1: Supervised Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech ISYE6740/CSE6740/CS7641: Computational Data Analysis/Machine from Portland, Learning Oregon: pervised learning (Supervised)
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationExpectation Maximization and Mixtures of Gaussians
Statistical Machine Learning Notes 10 Expectation Maximiation and Mixtures of Gaussians Instructor: Justin Domke Contents 1 Introduction 1 2 Preliminary: Jensen s Inequality 2 3 Expectation Maximiation
More informationDefinition 1.1 (Parametric family of distributions) A parametric distribution is a set of distribution functions, each of which is determined by speci
Definition 1.1 (Parametric family of distributions) A parametric distribution is a set of distribution functions, each of which is determined by specifying one or more values called parameters. The number
More informationECE 275B Homework # 1 Solutions Version Winter 2015
ECE 275B Homework # 1 Solutions Version Winter 2015 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant
More informationMAS223 Statistical Inference and Modelling Exercises
MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationIntroduction to Bayesian Computation
Introduction to Bayesian Computation Dr. Jarad Niemi STAT 544 - Iowa State University March 20, 2018 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 1 / 30 Bayesian computation
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationComputational Statistics and Data Analysis. Estimation for the three-parameter lognormal distribution based on progressively censored data
Computational Statistics and Data Analysis 53 (9) 358 359 Contents lists available at ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda stimation for
More informationSTAT 514 Solutions to Assignment #6
STAT 514 Solutions to Assignment #6 Question 1: Suppose that X 1,..., X n are a simple random sample from a Weibull distribution with density function f θ x) = θcx c 1 exp{ θx c }I{x > 0} for some fixed
More informationBayesian Inference by Density Ratio Estimation
Bayesian Inference by Density Ratio Estimation Michael Gutmann https://sites.google.com/site/michaelgutmann Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh
More informationOutline lecture 6 2(35)
Outline lecture 35 Lecture Expectation aximization E and clustering Thomas Schön Division of Automatic Control Linöping University Linöping Sweden. Email: schon@isy.liu.se Phone: 13-1373 Office: House
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 2013 Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 27 Course arrangements Lectures M.2
More informationStatistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling
1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More informationPiecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models
Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models Benjamin M. Marlin Mohammad Emtiyaz Khan Kevin P. Murphy University of British Columbia, Vancouver, BC, Canada V6T Z4 bmarlin@cs.ubc.ca
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationECE 275B Homework # 1 Solutions Winter 2018
ECE 275B Homework # 1 Solutions Winter 2018 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2 < < x n Thus,
More informationThe Expectation Maximization or EM algorithm
The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,
More informationStat 710: Mathematical Statistics Lecture 27
Stat 710: Mathematical Statistics Lecture 27 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 27 April 3, 2009 1 / 10 Lecture 27:
More informationThe EM Algorithm for the Finite Mixture of Exponential Distribution Models
Int. J. Contemp. Math. Sciences, Vol. 9, 2014, no. 2, 57-64 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijcms.2014.312133 The EM Algorithm for the Finite Mixture of Exponential Distribution
More informationMixtures of Rasch Models
Mixtures of Rasch Models Hannah Frick, Friedrich Leisch, Achim Zeileis, Carolin Strobl http://www.uibk.ac.at/statistics/ Introduction Rasch model for measuring latent traits Model assumption: Item parameters
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationAnswer Key for STAT 200B HW No. 7
Answer Key for STAT 200B HW No. 7 May 5, 2007 Problem 2.2 p. 649 Assuming binomial 2-sample model ˆπ =.75, ˆπ 2 =.6. a ˆτ = ˆπ 2 ˆπ =.5. From Ex. 2.5a on page 644: ˆπ ˆπ + ˆπ 2 ˆπ 2.75.25.6.4 = + =.087;
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationMACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun
Y. LeCun: Machine Learning and Pattern Recognition p. 1/? MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun The Courant Institute, New York University http://yann.lecun.com
More informationEM-algorithm for motif discovery
EM-algorithm for motif discovery Xiaohui Xie University of California, Irvine EM-algorithm for motif discovery p.1/19 Position weight matrix Position weight matrix representation of a motif with width
More informationChapter 11. Stochastic Methods Rooted in Statistical Mechanics
Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Neural Networks and Learning Machines (Haykin) Lecture Notes on Self-learning Neural Algorithms Byoung-Tak Zhang School of Computer Science
More informationRare-Event Simulation
Rare-Event Simulation Background: Read Chapter 6 of text. 1 Why is Rare-Event Simulation Challenging? Consider the problem of computing α = P(A) when P(A) is small (i.e. rare ). The crude Monte Carlo estimator
More informationVariational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures
17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter
More information