EM for ML Estimation

Size: px
Start display at page:

Download "EM for ML Estimation"

Transcription

1 Overview EM for ML Estimation An algorithm for Maximum Likelihood (ML) Estimation from incomplete data (Dempster, Laird, and Rubin, 1977) 1. Formulate complete data so that complete-data ML estimation is easy 2. Estimate missing data Expectation (E) step 3. Analysis of imputed complete data Maximization (M) step Simple implementation, and Stable converge monotonically Extensions 1. Simplicity. M-step: EConditionalM; E-step: MonteCarloEM 2. Efficiency. Numerical techniques: Newton-step, Aitken s acceleration (DLR, 1977), CG-EM,...; Statistical considerations: ECMEeither Q or L, ParametereXpanded-EM

2 The EM algorithm ML Estimation The problem: Given the observed-data X obs with the observed-data model f(x obs θ) (θ Θ), find ˆθ = argmax θ Θ f(x obs θ) = argmax θ Θ ln f(x obs θ) The idea: Formulate complete-data X com by creating missing data so that there exists a one-to-one mapping: (X obs,x mis ) X com The complete-data model satisfies f(x obs θ) = f(x obs,x mis θ)dx mis The mapping: X com X obs is many-to-one. It is easy to compute E[lnf(X com θ ) X obs,θ] It is easy to obtain argmax θ ln f(x com θ)

3 The EM algorithm ML Estimation The algorithm: Set a starting value θ (0), for t = 1, 2,... iterate between the following E and M steps E-step. Compute Q(θ θ (t 1) ) = E[ln f(x com θ) X obs,θ (t 1) ] M-step. Obtain θ (t) = argmax θ Q(θ θ (t 1) ) A special important case: For the exponential family f(x com θ) = h(x com) c(θ) exp{θs T (X com )} Let ˆθ(s(X com )) argmax θ ( ln c(θ) + θs T (X com )) E-step. Estimate the complete-data sufficient statistics s (t) = E[s(X com ) X obs,θ (t 1) ] M-step. Obtain θ (t) = ˆθ(s (t) )

4 The EM algorithm ML Estimation Example: Grouped multinomial data: 197 animals are distributed into four categories with the observed counts X obs = (y 1,y 2,y 3,y 4 ) = (125, 18, 20, 34) A genetic model for the population specifies cell probabilities ) ( θ 4, 1 θ 4, 1 θ 4, θ 4 Thus f(x obs θ) = ( i y i )! i y i! = ( i y i )! i y i! ( θ ) y1 ( ) y2 ( 1 θ 1 θ ( θ ) y1 ( ) y2 +y 1 θ 3 ( θ ) y3 ( θ ) y4 4 ) y4 Split the first category into two categories with cell probabilities (1/2, θ/4)

5 The EM algorithm ML Estimation Complete-data: X com = (x 0,x 1,x 2,x 3,x 4 ) with the mapping y 1 = x 0 + x 1, y 2 = x 2, y 3 = x 3, y 4 = x 4 Complete-data model: X com Multinomial Complete-data log-likelihood: ( 1, θ, 1 θ, 1 θ, ) θ ln f(x com θ) = (x 1 +x 4 ) ln θ+(x 2 +x 3 ) ln(1 θ)+h(x com ) E-step: Compute x (t) 1 = y 1 4 θ(t 1) 1 + θ(t 1) 2 4 M-step: Compute θ (t) = x (t) 1 + y 4 x (t) 1 + y 4 + y 2 + y 3

6 Convergence of Likelihood Sequences The EM algorithm Notation f(x mis X obs,θ ) = f(x obs,x mis θ ) f(x obs θ ) H(θ θ) = E [ln f(x mis X obs,θ ) X obs,θ] Q(θ θ) = E [ln f(x obs,x mis X obs,θ ) X obs,θ] L(θ) = lnf(x obs θ) Then L(θ ) = Q(θ θ) H(θ θ) Lemma (Jensen s inequality) H(θ θ) H(θ θ) GEM A Generalized EM algorithm replaces θ (t) = argmax θ Q(θ θ (t 1) ) in the M-step with a θ (t) that Q(θ (t) θ (t 1) ) Q(θ (t 1) θ (t 1) ) Monotonicity Each iteration of GEM increases the actual likelihood, i.e., L(θ (t) ) L(θ (t 1) ) (t = 1, 2,...) Q: Will the log-likelihood sequence {L(θ (t) )} converge?

7 Convergence of EM Sequences The EM algorithm How about the convergence 1 of EM sequences {θ (t) }? Convergence of EM sequences (Wu, 1983) 1 The proof of convergence of EM sequences (Theorem 2) in DLR contains an incorrect use of the triangle inequality. The theorem itself is questionable. Boyles (1983) provided a (GEM not EM) counterexample.

8 Rate of Convergence The EM algorithm GEM Mapping θ (t) = M(θ (t 1) ) from Θ to Θ Convergence rate (DLR) Suppose that {θ (t) } is a GEM sequence such that (1) θ (t) convergences to θ in the closure of Θ, (2) D 10 Q(θ (t) θ (t 1) ) = 0 and (3) D 20 Q(θ (t) θ (t 1) ) is negative definite with eigenvalues bounded away from zero. Then DL(θ ) = 0, D 20 Q(θ θ ) is negative definite and DM(θ ) = D 20 H(θ θ )[D 20 Q(θ θ )] 1 Note that D 2 L(θ ) is a measure of the information in the observed data about θ....

9 Applications The EM Algorithm Missing Data line break! Multinomial sampling The numerical example discussed in class Normal linear model Create a complete special design matrix Multivariate normal sampling Missing data Grouping, Censoring, and Truncation Consider the example of Liu and Sun (2000, Technometrics) Finite Mixtures Consider ML estimation from a random sample from f(x θ) = k j=1 α j f j (x θ), where α j > 0, j α j = 1, and f j (x θ) is the density function of N(µ j,σ 2 j). Suppose x 1,...,x n is a sample from f(x θ). Consider the hierarchical model: z i θ Multinomial(α 1,...,α k ) and x i (θ,z i = j) f j (x θ) by introducing the missing data z 1,...,z n.

10 Applications The EM Algorithm Mixed-effects Models (Laird and Ware, 1982, Biometrics) y i = X i β + Z i b i + e i y i is (n i 1) observed vector, X i and Z i are (n i p) and (n i q) design matrices, respectively, b i N q (0, Ψ), e i N ni (0,σ 2 I) Factor Analysis Models y i = µ + βz i + e i y i is (p 1) observed vector, β is called the factor-loading matrix, Z i N q (0,I) (q < p), e i N p (0,σ 2 I)

11 Applications The EM Algorithm Positron Emission Tomography (Vardi, Shepp, and Kaufman, 1985, JASA, p. 8-19) Network Tomography (Vardi, 1996, JASA, ) ML Estimation of Discrete Distributions with Simplex Constraints (Liu, 2000, JASA, )

12 Applications The EM Algorithm Local Multiple Biopolymer Sequence Alignment (Lawrence and Reilly, 1990, Proteins: Structure, Function, and Genetics, 7, 41-51) Linear biopolymer sequence data Sequence Observed Data 1 y 1,1 y 1,2... y 1,n1 2 y 2,1 y 2,2... y 2,n2... m y m,1 y m,2... y m,nm where y i,j {s k : k = 1,...,K} for all i,j with K = 4 for DNA sequences and K = 20 for proteins (sequences of amino acids)..

13 Applications The EM Algorithm Assumptions: 1. Each observed sequence contains one complete mitof (subsequence) of a common type. 2. The location of the motif, denoted by x i, in sequence i is unknown for all i = 1,..., m. 3. The motif length, J, is fixed. 4. The common type is specified as follows: y i,xi, y i,xi +1,..., y i,xi +J 1 are independent For j = 1,..., J, y i,xi +(j 1) multinomial(θ j,1,...,θ j,k ), i.e., Prob(y i,xi +(j 1) = s k ) = θ j,k. θ j (θ j,1,..., θ j,k ) is unknown. 5. All y i,j are independent. For y i,j in background (not in the motif in sequence i), y i,j multinomial(θ 0,1,...,θ 0,K ), i.e., Prob(y i,j = s k ) = θ 0,k, θ 0 (θ 0,1,..., θ 0,K ) is unknown. 6. For unknown motif locations, Prob(x i = l) = 1 n i J+1.

14 for j = 1,...,J and k = 1,...,K. Applications The EM Algorithm Complete-data likelihood (is proportional to): K k=1 m i=1 θ N i,k(x i ) 0,k J j=1 K k=1 m i=1 θ M i,j,k(x i ) j,k where N i,k (x i ) = {y i,t : t = 1,...,x i 1,x i + J,...,n i ;y i,t = s k } M i,j,k (x i ) = {y i,t : t = x i + (j 1);y i,t = s k } The complete-data log-likelihood function: [ K m ] [ J K m N i,k (x i ) ln θ 0,k + k=1 i=1 j=1 k=1 i=1 M i,j,k (x i ) ] ln θ j,k The complete-data ML estimate of θ: ˆθ 0,k = mi=1 N i,k (x i ) mi=1 (n i J) and ˆθj,k = mi=1 M i,j,k (x i ) m

15 Applications The EM Algorithm E Step: Fix θ at its current estimate and compute ˆN i,k E(N i,k (x i ) Y,θ) and ˆM i,j,k E(M i,j,k (x i ) Y,θ) by evaluating = = Prob(x i = l Y,θ) K k=1 K k=1 J θ N i,k(l) 0,k j=1 k=1 J j=1 θ N i,k(l) 0,k K J K j=1 k=1 J K j=1 k=1 ( ) Mi,j,k (l) θj,k θ 0,k θ j,k(yi,l+j 1 ) θ 0,k(yi,l+j 1 ) θ M i,j,k(l) j,k θ M i,j,k(l) 0,k J K j=1 k=1 ( θj,k θ 0,k (l = 1,...,n i J + 1) ) Mi,j,k (l) for i = 1,...,m, where k(y) is the index of y in {s 1,s 2,...,s K }.

16 Applications The EM Algorithm M Step: Update the estimate of θ: ˆθ 0,k = mi=1 ˆNi,k mi=1 (n i J) and ˆθj,k = mi=1 ˆMi,j,k m for j = 1,...,J and k = 1,...,K.

17 Observed Information Matrix EM Supplements and Extensions Louis (1982, JRSS(B), ) I obs (θ) = E [I com (θ) X obs,θ] E [ G(Xcom θ)g T (X com θ) X obs,θ ] +G (X obs θ)g T (X obs θ) where G(X com θ) and G (X obs θ) are the gradient vectors of the Q and L functions Meng and Rubin (1991, JASA, ) I 1 (θ X obs ) = (E [I(θ X com ) X obs,θ ]) 1 + (E [I(θ X com ) X obs,θ ]) 1 DM(I DM) 1 where θ (t+1) θ (θ (t+1) θ )DM and DM is approximated numerically using the computer code for E and M steps. Liu (1998, Biometrika, ) Computing observed information matrix from conditional information

18 Extensions EM Supplements and Extensions ECM (Meng and Rubin, 1993, Biometrika) ECME (Liu and Rubin, 1994, Biometrika) PX-EM (Liu, Rubin, and Wu, 1998, Biometrika)

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

Optimization Methods II. EM algorithms.

Optimization Methods II. EM algorithms. Aula 7. Optimization Methods II. 0 Optimization Methods II. EM algorithms. Anatoli Iambartsev IME-USP Aula 7. Optimization Methods II. 1 [RC] Missing-data models. Demarginalization. The term EM algorithms

More information

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares

More information

Estimating the parameters of hidden binomial trials by the EM algorithm

Estimating the parameters of hidden binomial trials by the EM algorithm Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

EM Algorithms from a Non-Stochastic Perspective

EM Algorithms from a Non-Stochastic Perspective EM Algorithms from a Non-Stochastic Perspective Charles Byrne Charles Byrne@uml.edu Department of Mathematical Sciences, University of Massachusetts Lowell, Lowell, MA 01854, USA December 15, 2014 Abstract

More information

Chapter 3 : Likelihood function and inference

Chapter 3 : Likelihood function and inference Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM

More information

Finite Singular Multivariate Gaussian Mixture

Finite Singular Multivariate Gaussian Mixture 21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models.

Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models. Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.com This Lecture Expectation-Maximization (EM)

More information

Computational statistics

Computational statistics Computational statistics EM algorithm Thierry Denœux February-March 2017 Thierry Denœux Computational statistics February-March 2017 1 / 72 EM Algorithm An iterative optimization strategy motivated by

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

EM Algorithms. September 6, 2012

EM Algorithms. September 6, 2012 EM Algorithms Charles Byrne and Paul P. B. Eggermont September 6, 2012 Abstract A well studied procedure for estimating a parameter from observed data is to maximize the likelihood function. When a maximizer

More information

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction An approximated EM algorithm Simulation studies Discussion 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse

More information

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Statistica Sinica 15(2005), 831-840 PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Florin Vaida University of California at San Diego Abstract: It is well known that the likelihood sequence of the EM algorithm

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential

More information

THE UNIVERSITY OF CHICAGO CONSTRUCTION, IMPLEMENTATION, AND THEORY OF ALGORITHMS BASED ON DATA AUGMENTATION AND MODEL REDUCTION

THE UNIVERSITY OF CHICAGO CONSTRUCTION, IMPLEMENTATION, AND THEORY OF ALGORITHMS BASED ON DATA AUGMENTATION AND MODEL REDUCTION THE UNIVERSITY OF CHICAGO CONSTRUCTION, IMPLEMENTATION, AND THEORY OF ALGORITHMS BASED ON DATA AUGMENTATION AND MODEL REDUCTION A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES

More information

Weighted Finite-State Transducers in Computational Biology

Weighted Finite-State Transducers in Computational Biology Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial

More information

Monotonically Overrelaxed EM Algorithms

Monotonically Overrelaxed EM Algorithms Monotonically Overrelaxed EM Algorithms Yaming Yu Department of Statistics, University of California, Irvine June 28, 2011 Abstract We explore the idea of overrelaxation for accelerating the expectation-maximization

More information

COMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY

COMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY Annales Univ Sci Budapest, Sect Comp 45 2016) 45 55 COMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY Ágnes M Kovács Budapest, Hungary) Howard M Taylor Newark, DE, USA) Communicated

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

New mixture models and algorithms in the mixtools package

New mixture models and algorithms in the mixtools package New mixture models and algorithms in the mixtools package Didier Chauveau MAPMO - UMR 6628 - Université d Orléans 1ères Rencontres BoRdeaux Juillet 2012 Outline 1 Mixture models and EM algorithm-ology

More information

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate

More information

Lecture 4 September 15

Lecture 4 September 15 IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric

More information

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

An Introduction to mixture models

An Introduction to mixture models An Introduction to mixture models by Franck Picard Research Report No. 7 March 2007 Statistics for Systems Biology Group Jouy-en-Josas/Paris/Evry, France http://genome.jouy.inra.fr/ssb/ An introduction

More information

EM Algorithms. Charles Byrne (Charles Department of Mathematical Sciences University of Massachusetts Lowell Lowell, MA 01854, USA

EM Algorithms. Charles Byrne (Charles Department of Mathematical Sciences University of Massachusetts Lowell Lowell, MA 01854, USA EM Algorithms Charles Byrne (Charles Byrne@uml.edu) Department of Mathematical Sciences University of Massachusetts Lowell Lowell, MA 01854, USA March 6, 2011 Abstract The EM algorithm is not a single

More information

ON SOME TWO-STEP DENSITY ESTIMATION METHOD

ON SOME TWO-STEP DENSITY ESTIMATION METHOD UNIVESITATIS IAGELLONICAE ACTA MATHEMATICA, FASCICULUS XLIII 2005 ON SOME TWO-STEP DENSITY ESTIMATION METHOD by Jolanta Jarnicka Abstract. We introduce a new two-step kernel density estimation method,

More information

Issues on Fitting Univariate and Multivariate Hyperbolic Distributions

Issues on Fitting Univariate and Multivariate Hyperbolic Distributions Issues on Fitting Univariate and Multivariate Hyperbolic Distributions Thomas Tran PhD student Department of Statistics University of Auckland thtam@stat.auckland.ac.nz Fitting Hyperbolic Distributions

More information

ECE 275B Homework #2 Due Thursday 2/12/2015. MIDTERM is Scheduled for Thursday, February 19, 2015

ECE 275B Homework #2 Due Thursday 2/12/2015. MIDTERM is Scheduled for Thursday, February 19, 2015 Reading ECE 275B Homework #2 Due Thursday 2/12/2015 MIDTERM is Scheduled for Thursday, February 19, 2015 Read and understand the Newton-Raphson and Method of Scores MLE procedures given in Kay, Example

More information

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 ) Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select

More information

Lecture 3 September 1

Lecture 3 September 1 STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have

More information

Chapter 08: Direct Maximum Likelihood/MAP Estimation and Incomplete Data Problems

Chapter 08: Direct Maximum Likelihood/MAP Estimation and Incomplete Data Problems LEARNING AND INFERENCE IN GRAPHICAL MODELS Chapter 08: Direct Maximum Likelihood/MAP Estimation and Incomplete Data Problems Dr. Martin Lauer University of Freiburg Machine Learning Lab Karlsruhe Institute

More information

MAXIMUM LIKELIHOOD ESTIMATION OF FACTOR ANALYSIS USING THE ECME ALGORITHM WITH COMPLETE AND INCOMPLETE DATA

MAXIMUM LIKELIHOOD ESTIMATION OF FACTOR ANALYSIS USING THE ECME ALGORITHM WITH COMPLETE AND INCOMPLETE DATA Statistica Sinica 8(1998), 79-747 MAXIMUM LIKELIHOOD ESTIMATION OF FACTOR ANALYSIS USING THE ECME ALGORITHM WITH COMPLETE AND INCOMPLETE DATA Chuanhai Liu and Donald B. Rubin Bell Labs and Harvard University

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Theory of Statistical Tests

Theory of Statistical Tests Ch 9. Theory of Statistical Tests 9.1 Certain Best Tests How to construct good testing. For simple hypothesis H 0 : θ = θ, H 1 : θ = θ, Page 1 of 100 where Θ = {θ, θ } 1. Define the best test for H 0 H

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

f X (y, z; θ, σ 2 ) = 1 2 (2πσ2 ) 1 2 exp( (y θz) 2 /2σ 2 ) l c,n (θ, σ 2 ) = i log f(y i, Z i ; θ, σ 2 ) (Y i θz i ) 2 /2σ 2

f X (y, z; θ, σ 2 ) = 1 2 (2πσ2 ) 1 2 exp( (y θz) 2 /2σ 2 ) l c,n (θ, σ 2 ) = i log f(y i, Z i ; θ, σ 2 ) (Y i θz i ) 2 /2σ 2 Chapter 7: EM algorithm in exponential families: JAW 4.30-32 7.1 (i) The EM Algorithm finds MLE s in problems with latent variables (sometimes called missing data ): things you wish you could observe,

More information

THE NESTED DIRICHLET DISTRIBUTION AND INCOMPLETE CATEGORICAL DATA ANALYSIS

THE NESTED DIRICHLET DISTRIBUTION AND INCOMPLETE CATEGORICAL DATA ANALYSIS Statistica Sinica 19 (2009), 251-271 THE NESTED DIRICHLET DISTRIBUTION AND INCOMPLETE CATEGORICAL DATA ANALYSIS Kai Wang Ng 1, Man-Lai Tang 2, Guo-Liang Tian 1 and Ming Tan 3 1 The University of Hong Kong,

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

A Note on the Expectation-Maximization (EM) Algorithm

A Note on the Expectation-Maximization (EM) Algorithm A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization

More information

An Akaike Criterion based on Kullback Symmetric Divergence in the Presence of Incomplete-Data

An Akaike Criterion based on Kullback Symmetric Divergence in the Presence of Incomplete-Data An Akaike Criterion based on Kullback Symmetric Divergence Bezza Hafidi a and Abdallah Mkhadri a a University Cadi-Ayyad, Faculty of sciences Semlalia, Department of Mathematics, PB.2390 Marrakech, Moroco

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Fitting Narrow Emission Lines in X-ray Spectra

Fitting Narrow Emission Lines in X-ray Spectra Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007 Outline of Presentation Outline This talk has three components:

More information

Non-Stochastic EM Algorithms in Optimization

Non-Stochastic EM Algorithms in Optimization Non-Stochastic EM Algorithms in Optimization Charles Byrne August 30, 2013 Abstract We consider the problem of maximizing a non-negative function f : Z R, where Z is an arbitrary set. We assume that there

More information

Machine Learning for Signal Processing Expectation Maximization Mixture Models. Bhiksha Raj 27 Oct /

Machine Learning for Signal Processing Expectation Maximization Mixture Models. Bhiksha Raj 27 Oct / Machine Learning for Signal rocessing Expectation Maximization Mixture Models Bhiksha Raj 27 Oct 2016 11755/18797 1 Learning Distributions for Data roblem: Given a collection of examples from some data,

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Convergence Properties of a Sequential Regression Multiple Imputation Algorithm

Convergence Properties of a Sequential Regression Multiple Imputation Algorithm Convergence Properties of a Sequential Regression Multiple Imputation Algorithm Jian Zhu 1, Trivellore E. Raghunathan 2 Department of Biostatistics, University of Michigan, Ann Arbor Abstract A sequential

More information

1.5 MM and EM Algorithms

1.5 MM and EM Algorithms 72 CHAPTER 1. SEQUENCE MODELS 1.5 MM and EM Algorithms The MM algorithm [1] is an iterative algorithm that can be used to minimize or maximize a function. We focus on using it to maximize the log likelihood

More information

Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs 11-755 Machine Learning for Signal rocessing Expectation Maximization Mixture Models HMMs Class 9. 21 Sep 2010 1 Learning Distributions for Data roblem: Given a collection of examples from some data, estimate

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence

More information

ANALYSIS OF TWO-LEVEL STRUCTURAL EQUATION MODELS VIA EM TYPE ALGORITHMS

ANALYSIS OF TWO-LEVEL STRUCTURAL EQUATION MODELS VIA EM TYPE ALGORITHMS Statistica Sinica 8(1998), 749-766 ANALYSIS OF TWO-LEVEL STRUCTURAL EQUATION MODELS VIA EM TYPE ALGORITHMS Sik-Yum Lee and Wai-Yin Poon Chinese University of Hong Kong Abstract: In this paper, the maximum

More information

ECE 275B Homework #2 Due Thursday MIDTERM is Scheduled for Tuesday, February 21, 2012

ECE 275B Homework #2 Due Thursday MIDTERM is Scheduled for Tuesday, February 21, 2012 Reading ECE 275B Homework #2 Due Thursday 2-16-12 MIDTERM is Scheduled for Tuesday, February 21, 2012 Read and understand the Newton-Raphson and Method of Scores MLE procedures given in Kay, Example 7.11,

More information

STATISTICAL INFERENCE WITH DATA AUGMENTATION AND PARAMETER EXPANSION

STATISTICAL INFERENCE WITH DATA AUGMENTATION AND PARAMETER EXPANSION STATISTICAL INFERENCE WITH arxiv:1512.00847v1 [math.st] 2 Dec 2015 DATA AUGMENTATION AND PARAMETER EXPANSION Yannis G. Yatracos Faculty of Communication and Media Studies Cyprus University of Technology

More information

The EM Algorithm: Theory, Applications and Related Methods

The EM Algorithm: Theory, Applications and Related Methods Charles L. Byrne Department of Mathematical Sciences University of Massachusetts Lowell March 1, 2017 The EM Algorithm: Theory, Applications and Related Methods Contents Preface vii 1 Introduction 1 1.1

More information

EM Algorithm. Expectation-maximization (EM) algorithm.

EM Algorithm. Expectation-maximization (EM) algorithm. EM Algorithm Outline: Expectation-maximization (EM) algorithm. Examples. Reading: A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc.,

More information

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level A Monte Carlo Simulation to Test the Tenability of the SuperMatrix Approach Kyle M Lang Quantitative Psychology

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Prof. C. F. Jeff Wu ISyE 8813 Section 1 Motivation What is parameter estimation? A modeler proposes a model M(θ) for explaining some observed phenomenon θ are the parameters

More information

Rejoinder. David R. Hunter and Kenneth Lange

Rejoinder. David R. Hunter and Kenneth Lange Rejoinder David R. Hunter and Kenneth Lange What s in a Name? We thank the discussants for their insightful and substantive comments. In replying, we open with the least substantive issue the name of the

More information

Likelihood-based inference with missing data under missing-at-random

Likelihood-based inference with missing data under missing-at-random Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

The Expectation Maximization Algorithm

The Expectation Maximization Algorithm The Expectation Maximization Algorithm Frank Dellaert College of Computing, Georgia Institute of Technology Technical Report number GIT-GVU-- February Abstract This note represents my attempt at explaining

More information

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently

More information

Mixtures of Rasch Models

Mixtures of Rasch Models Mixtures of Rasch Models Hannah Frick, Friedrich Leisch, Achim Zeileis, Carolin Strobl http://www.uibk.ac.at/statistics/ Introduction Rasch model for measuring latent traits Model assumption: Item parameters

More information

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) Exercises Introduction to Machine Learning SS 2018 Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) LAS Group, Institute for Machine Learning Dept of Computer Science, ETH Zürich Prof

More information

Chapter 4. Theory of Tests. 4.1 Introduction

Chapter 4. Theory of Tests. 4.1 Introduction Chapter 4 Theory of Tests 4.1 Introduction Parametric model: (X, B X, P θ ), P θ P = {P θ θ Θ} where Θ = H 0 +H 1 X = K +A : K: critical region = rejection region / A: acceptance region A decision rule

More information

Lecture 6: April 19, 2002

Lecture 6: April 19, 2002 EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin

More information

Computing the MLE and the EM Algorithm

Computing the MLE and the EM Algorithm ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Computing the MLE and the EM Algorithm If X p(x θ), θ Θ, then the MLE is the solution to the equations logp(x θ) θ 0. Sometimes these equations

More information

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Maximum likelihood (ML) Priors, and

More information

1 One-way analysis of variance

1 One-way analysis of variance LIST OF FORMULAS (Version from 21. November 2014) STK2120 1 One-way analysis of variance Assume X ij = µ+α i +ɛ ij ; j = 1, 2,..., J i ; i = 1, 2,..., I ; where ɛ ij -s are independent and N(0, σ 2 ) distributed.

More information

1 Expectation Maximization

1 Expectation Maximization Introduction Expectation-Maximization Bibliographical notes 1 Expectation Maximization Daniel Khashabi 1 khashab2@illinois.edu 1.1 Introduction Consider the problem of parameter learning by maximizing

More information

Maximum likelihood estimation via the ECM algorithm: A general framework

Maximum likelihood estimation via the ECM algorithm: A general framework Biometrika (1993), 80, 2, pp. 267-78 Printed in Great Britain Maximum likelihood estimation via the ECM algorithm: A general framework BY XIAO-LI MENG Department of Statistics, University of Chicago, Chicago,

More information

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain 0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher

More information

An introduction to Variational calculus in Machine Learning

An introduction to Variational calculus in Machine Learning n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area

More information

Motif representation using position weight matrix

Motif representation using position weight matrix Motif representation using position weight matrix Xiaohui Xie University of California, Irvine Motif representation using position weight matrix p.1/31 Position weight matrix Position weight matrix representation

More information

Lecture 8: Graphical models for Text

Lecture 8: Graphical models for Text Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/

More information

Maximum likelihood in log-linear models

Maximum likelihood in log-linear models Graphical Models, Lecture 4, Michaelmas Term 2010 October 22, 2010 Generating class Dependence graph of log-linear model Conformal graphical models Factor graphs Let A denote an arbitrary set of subsets

More information

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With

More information

ESTIMATION ALGORITHMS

ESTIMATION ALGORITHMS ESTIMATIO ALGORITHMS Solving normal equations using QR-factorization on-linear optimization Two and multi-stage methods EM algorithm FEL 3201 Estimation Algorithms - 1 SOLVIG ORMAL EQUATIOS USIG QR FACTORIZATIO

More information

The Dynamic ECME Algorithm

The Dynamic ECME Algorithm The Dynamic ECME Algorithm Yunxiao He Yale University, New Haven, USA Chuanhai Liu Purdue University, West Lafayette, USA Summary. The Expectation/Conditional Maximisation Either (ECME) algorithm has proven

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates Communications in Statistics - Theory and Methods ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 Analysis of Gamma and Weibull Lifetime Data under a

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

Normalising constants and maximum likelihood inference

Normalising constants and maximum likelihood inference Normalising constants and maximum likelihood inference Jakob G. Rasmussen Department of Mathematics Aalborg University Denmark March 9, 2011 1/14 Today Normalising constants Approximation of normalising

More information

Gibbs Sampling Methods for Multiple Sequence Alignment

Gibbs Sampling Methods for Multiple Sequence Alignment Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical

More information

Inference based on the em algorithm for the competing risk model with masked causes of failure

Inference based on the em algorithm for the competing risk model with masked causes of failure Inference based on the em algorithm for the competing risk model with masked causes of failure By RADU V. CRAIU Department of Statistics, University of Toronto, 100 St. George Street, Toronto Ontario,

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could

More information

STAT Advanced Bayesian Inference

STAT Advanced Bayesian Inference 1 / 8 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics March 5, 2018 Distributional approximations 2 / 8 Distributional approximations are useful for quick inferences, as starting

More information

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z)

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z) CSC2515 Machine Learning Sam Roweis Lecture 8: Unsupervised Learning & EM Algorithm October 31, 2006 Partially Unobserved Variables 2 Certain variables q in our models may be unobserved, either at training

More information

SQUAREM. An R package for Accelerating Slowly Convergent Fixed-Point Iterations Including the EM and MM algorithms.

SQUAREM. An R package for Accelerating Slowly Convergent Fixed-Point Iterations Including the EM and MM algorithms. An R package for Accelerating Slowly Convergent Fixed-Point Iterations Including the EM and MM algorithms Ravi 1 1 Division of Geriatric Medicine & Gerontology Johns Hopkins University Baltimore, MD, USA

More information