EM for ML Estimation
|
|
- Scott Rice
- 6 years ago
- Views:
Transcription
1 Overview EM for ML Estimation An algorithm for Maximum Likelihood (ML) Estimation from incomplete data (Dempster, Laird, and Rubin, 1977) 1. Formulate complete data so that complete-data ML estimation is easy 2. Estimate missing data Expectation (E) step 3. Analysis of imputed complete data Maximization (M) step Simple implementation, and Stable converge monotonically Extensions 1. Simplicity. M-step: EConditionalM; E-step: MonteCarloEM 2. Efficiency. Numerical techniques: Newton-step, Aitken s acceleration (DLR, 1977), CG-EM,...; Statistical considerations: ECMEeither Q or L, ParametereXpanded-EM
2 The EM algorithm ML Estimation The problem: Given the observed-data X obs with the observed-data model f(x obs θ) (θ Θ), find ˆθ = argmax θ Θ f(x obs θ) = argmax θ Θ ln f(x obs θ) The idea: Formulate complete-data X com by creating missing data so that there exists a one-to-one mapping: (X obs,x mis ) X com The complete-data model satisfies f(x obs θ) = f(x obs,x mis θ)dx mis The mapping: X com X obs is many-to-one. It is easy to compute E[lnf(X com θ ) X obs,θ] It is easy to obtain argmax θ ln f(x com θ)
3 The EM algorithm ML Estimation The algorithm: Set a starting value θ (0), for t = 1, 2,... iterate between the following E and M steps E-step. Compute Q(θ θ (t 1) ) = E[ln f(x com θ) X obs,θ (t 1) ] M-step. Obtain θ (t) = argmax θ Q(θ θ (t 1) ) A special important case: For the exponential family f(x com θ) = h(x com) c(θ) exp{θs T (X com )} Let ˆθ(s(X com )) argmax θ ( ln c(θ) + θs T (X com )) E-step. Estimate the complete-data sufficient statistics s (t) = E[s(X com ) X obs,θ (t 1) ] M-step. Obtain θ (t) = ˆθ(s (t) )
4 The EM algorithm ML Estimation Example: Grouped multinomial data: 197 animals are distributed into four categories with the observed counts X obs = (y 1,y 2,y 3,y 4 ) = (125, 18, 20, 34) A genetic model for the population specifies cell probabilities ) ( θ 4, 1 θ 4, 1 θ 4, θ 4 Thus f(x obs θ) = ( i y i )! i y i! = ( i y i )! i y i! ( θ ) y1 ( ) y2 ( 1 θ 1 θ ( θ ) y1 ( ) y2 +y 1 θ 3 ( θ ) y3 ( θ ) y4 4 ) y4 Split the first category into two categories with cell probabilities (1/2, θ/4)
5 The EM algorithm ML Estimation Complete-data: X com = (x 0,x 1,x 2,x 3,x 4 ) with the mapping y 1 = x 0 + x 1, y 2 = x 2, y 3 = x 3, y 4 = x 4 Complete-data model: X com Multinomial Complete-data log-likelihood: ( 1, θ, 1 θ, 1 θ, ) θ ln f(x com θ) = (x 1 +x 4 ) ln θ+(x 2 +x 3 ) ln(1 θ)+h(x com ) E-step: Compute x (t) 1 = y 1 4 θ(t 1) 1 + θ(t 1) 2 4 M-step: Compute θ (t) = x (t) 1 + y 4 x (t) 1 + y 4 + y 2 + y 3
6 Convergence of Likelihood Sequences The EM algorithm Notation f(x mis X obs,θ ) = f(x obs,x mis θ ) f(x obs θ ) H(θ θ) = E [ln f(x mis X obs,θ ) X obs,θ] Q(θ θ) = E [ln f(x obs,x mis X obs,θ ) X obs,θ] L(θ) = lnf(x obs θ) Then L(θ ) = Q(θ θ) H(θ θ) Lemma (Jensen s inequality) H(θ θ) H(θ θ) GEM A Generalized EM algorithm replaces θ (t) = argmax θ Q(θ θ (t 1) ) in the M-step with a θ (t) that Q(θ (t) θ (t 1) ) Q(θ (t 1) θ (t 1) ) Monotonicity Each iteration of GEM increases the actual likelihood, i.e., L(θ (t) ) L(θ (t 1) ) (t = 1, 2,...) Q: Will the log-likelihood sequence {L(θ (t) )} converge?
7 Convergence of EM Sequences The EM algorithm How about the convergence 1 of EM sequences {θ (t) }? Convergence of EM sequences (Wu, 1983) 1 The proof of convergence of EM sequences (Theorem 2) in DLR contains an incorrect use of the triangle inequality. The theorem itself is questionable. Boyles (1983) provided a (GEM not EM) counterexample.
8 Rate of Convergence The EM algorithm GEM Mapping θ (t) = M(θ (t 1) ) from Θ to Θ Convergence rate (DLR) Suppose that {θ (t) } is a GEM sequence such that (1) θ (t) convergences to θ in the closure of Θ, (2) D 10 Q(θ (t) θ (t 1) ) = 0 and (3) D 20 Q(θ (t) θ (t 1) ) is negative definite with eigenvalues bounded away from zero. Then DL(θ ) = 0, D 20 Q(θ θ ) is negative definite and DM(θ ) = D 20 H(θ θ )[D 20 Q(θ θ )] 1 Note that D 2 L(θ ) is a measure of the information in the observed data about θ....
9 Applications The EM Algorithm Missing Data line break! Multinomial sampling The numerical example discussed in class Normal linear model Create a complete special design matrix Multivariate normal sampling Missing data Grouping, Censoring, and Truncation Consider the example of Liu and Sun (2000, Technometrics) Finite Mixtures Consider ML estimation from a random sample from f(x θ) = k j=1 α j f j (x θ), where α j > 0, j α j = 1, and f j (x θ) is the density function of N(µ j,σ 2 j). Suppose x 1,...,x n is a sample from f(x θ). Consider the hierarchical model: z i θ Multinomial(α 1,...,α k ) and x i (θ,z i = j) f j (x θ) by introducing the missing data z 1,...,z n.
10 Applications The EM Algorithm Mixed-effects Models (Laird and Ware, 1982, Biometrics) y i = X i β + Z i b i + e i y i is (n i 1) observed vector, X i and Z i are (n i p) and (n i q) design matrices, respectively, b i N q (0, Ψ), e i N ni (0,σ 2 I) Factor Analysis Models y i = µ + βz i + e i y i is (p 1) observed vector, β is called the factor-loading matrix, Z i N q (0,I) (q < p), e i N p (0,σ 2 I)
11 Applications The EM Algorithm Positron Emission Tomography (Vardi, Shepp, and Kaufman, 1985, JASA, p. 8-19) Network Tomography (Vardi, 1996, JASA, ) ML Estimation of Discrete Distributions with Simplex Constraints (Liu, 2000, JASA, )
12 Applications The EM Algorithm Local Multiple Biopolymer Sequence Alignment (Lawrence and Reilly, 1990, Proteins: Structure, Function, and Genetics, 7, 41-51) Linear biopolymer sequence data Sequence Observed Data 1 y 1,1 y 1,2... y 1,n1 2 y 2,1 y 2,2... y 2,n2... m y m,1 y m,2... y m,nm where y i,j {s k : k = 1,...,K} for all i,j with K = 4 for DNA sequences and K = 20 for proteins (sequences of amino acids)..
13 Applications The EM Algorithm Assumptions: 1. Each observed sequence contains one complete mitof (subsequence) of a common type. 2. The location of the motif, denoted by x i, in sequence i is unknown for all i = 1,..., m. 3. The motif length, J, is fixed. 4. The common type is specified as follows: y i,xi, y i,xi +1,..., y i,xi +J 1 are independent For j = 1,..., J, y i,xi +(j 1) multinomial(θ j,1,...,θ j,k ), i.e., Prob(y i,xi +(j 1) = s k ) = θ j,k. θ j (θ j,1,..., θ j,k ) is unknown. 5. All y i,j are independent. For y i,j in background (not in the motif in sequence i), y i,j multinomial(θ 0,1,...,θ 0,K ), i.e., Prob(y i,j = s k ) = θ 0,k, θ 0 (θ 0,1,..., θ 0,K ) is unknown. 6. For unknown motif locations, Prob(x i = l) = 1 n i J+1.
14 for j = 1,...,J and k = 1,...,K. Applications The EM Algorithm Complete-data likelihood (is proportional to): K k=1 m i=1 θ N i,k(x i ) 0,k J j=1 K k=1 m i=1 θ M i,j,k(x i ) j,k where N i,k (x i ) = {y i,t : t = 1,...,x i 1,x i + J,...,n i ;y i,t = s k } M i,j,k (x i ) = {y i,t : t = x i + (j 1);y i,t = s k } The complete-data log-likelihood function: [ K m ] [ J K m N i,k (x i ) ln θ 0,k + k=1 i=1 j=1 k=1 i=1 M i,j,k (x i ) ] ln θ j,k The complete-data ML estimate of θ: ˆθ 0,k = mi=1 N i,k (x i ) mi=1 (n i J) and ˆθj,k = mi=1 M i,j,k (x i ) m
15 Applications The EM Algorithm E Step: Fix θ at its current estimate and compute ˆN i,k E(N i,k (x i ) Y,θ) and ˆM i,j,k E(M i,j,k (x i ) Y,θ) by evaluating = = Prob(x i = l Y,θ) K k=1 K k=1 J θ N i,k(l) 0,k j=1 k=1 J j=1 θ N i,k(l) 0,k K J K j=1 k=1 J K j=1 k=1 ( ) Mi,j,k (l) θj,k θ 0,k θ j,k(yi,l+j 1 ) θ 0,k(yi,l+j 1 ) θ M i,j,k(l) j,k θ M i,j,k(l) 0,k J K j=1 k=1 ( θj,k θ 0,k (l = 1,...,n i J + 1) ) Mi,j,k (l) for i = 1,...,m, where k(y) is the index of y in {s 1,s 2,...,s K }.
16 Applications The EM Algorithm M Step: Update the estimate of θ: ˆθ 0,k = mi=1 ˆNi,k mi=1 (n i J) and ˆθj,k = mi=1 ˆMi,j,k m for j = 1,...,J and k = 1,...,K.
17 Observed Information Matrix EM Supplements and Extensions Louis (1982, JRSS(B), ) I obs (θ) = E [I com (θ) X obs,θ] E [ G(Xcom θ)g T (X com θ) X obs,θ ] +G (X obs θ)g T (X obs θ) where G(X com θ) and G (X obs θ) are the gradient vectors of the Q and L functions Meng and Rubin (1991, JASA, ) I 1 (θ X obs ) = (E [I(θ X com ) X obs,θ ]) 1 + (E [I(θ X com ) X obs,θ ]) 1 DM(I DM) 1 where θ (t+1) θ (θ (t+1) θ )DM and DM is approximated numerically using the computer code for E and M steps. Liu (1998, Biometrika, ) Computing observed information matrix from conditional information
18 Extensions EM Supplements and Extensions ECM (Meng and Rubin, 1993, Biometrika) ECME (Liu and Rubin, 1994, Biometrika) PX-EM (Liu, Rubin, and Wu, 1998, Biometrika)
EM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationBiostat 2065 Analysis of Incomplete Data
Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies
More informationOptimization Methods II. EM algorithms.
Aula 7. Optimization Methods II. 0 Optimization Methods II. EM algorithms. Anatoli Iambartsev IME-USP Aula 7. Optimization Methods II. 1 [RC] Missing-data models. Demarginalization. The term EM algorithms
More informationLast lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton
EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares
More informationEstimating the parameters of hidden binomial trials by the EM algorithm
Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :
More informationStatistical Estimation
Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from
More informationEM Algorithms from a Non-Stochastic Perspective
EM Algorithms from a Non-Stochastic Perspective Charles Byrne Charles Byrne@uml.edu Department of Mathematical Sciences, University of Massachusetts Lowell, Lowell, MA 01854, USA December 15, 2014 Abstract
More informationChapter 3 : Likelihood function and inference
Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM
More informationFinite Singular Multivariate Gaussian Mixture
21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationSpeech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models.
Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.com This Lecture Expectation-Maximization (EM)
More informationComputational statistics
Computational statistics EM algorithm Thierry Denœux February-March 2017 Thierry Denœux Computational statistics February-March 2017 1 / 72 EM Algorithm An iterative optimization strategy motivated by
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationEM Algorithms. September 6, 2012
EM Algorithms Charles Byrne and Paul P. B. Eggermont September 6, 2012 Abstract A well studied procedure for estimating a parameter from observed data is to maximize the likelihood function. When a maximizer
More informationIntroduction An approximated EM algorithm Simulation studies Discussion
1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse
More informationPARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS
Statistica Sinica 15(2005), 831-840 PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Florin Vaida University of California at San Diego Abstract: It is well known that the likelihood sequence of the EM algorithm
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationExponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger
Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential
More informationTHE UNIVERSITY OF CHICAGO CONSTRUCTION, IMPLEMENTATION, AND THEORY OF ALGORITHMS BASED ON DATA AUGMENTATION AND MODEL REDUCTION
THE UNIVERSITY OF CHICAGO CONSTRUCTION, IMPLEMENTATION, AND THEORY OF ALGORITHMS BASED ON DATA AUGMENTATION AND MODEL REDUCTION A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES
More informationWeighted Finite-State Transducers in Computational Biology
Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial
More informationMonotonically Overrelaxed EM Algorithms
Monotonically Overrelaxed EM Algorithms Yaming Yu Department of Statistics, University of California, Irvine June 28, 2011 Abstract We explore the idea of overrelaxation for accelerating the expectation-maximization
More informationCOMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY
Annales Univ Sci Budapest, Sect Comp 45 2016) 45 55 COMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY Ágnes M Kovács Budapest, Hungary) Howard M Taylor Newark, DE, USA) Communicated
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationMaximum Likelihood Estimation. only training data is available to design a classifier
Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional
More informationNew mixture models and algorithms in the mixtools package
New mixture models and algorithms in the mixtools package Didier Chauveau MAPMO - UMR 6628 - Université d Orléans 1ères Rencontres BoRdeaux Juillet 2012 Outline 1 Mixture models and EM algorithm-ology
More informationLikelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University
Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate
More informationLecture 4 September 15
IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric
More informationLecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008
Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationAn Introduction to mixture models
An Introduction to mixture models by Franck Picard Research Report No. 7 March 2007 Statistics for Systems Biology Group Jouy-en-Josas/Paris/Evry, France http://genome.jouy.inra.fr/ssb/ An introduction
More informationEM Algorithms. Charles Byrne (Charles Department of Mathematical Sciences University of Massachusetts Lowell Lowell, MA 01854, USA
EM Algorithms Charles Byrne (Charles Byrne@uml.edu) Department of Mathematical Sciences University of Massachusetts Lowell Lowell, MA 01854, USA March 6, 2011 Abstract The EM algorithm is not a single
More informationON SOME TWO-STEP DENSITY ESTIMATION METHOD
UNIVESITATIS IAGELLONICAE ACTA MATHEMATICA, FASCICULUS XLIII 2005 ON SOME TWO-STEP DENSITY ESTIMATION METHOD by Jolanta Jarnicka Abstract. We introduce a new two-step kernel density estimation method,
More informationIssues on Fitting Univariate and Multivariate Hyperbolic Distributions
Issues on Fitting Univariate and Multivariate Hyperbolic Distributions Thomas Tran PhD student Department of Statistics University of Auckland thtam@stat.auckland.ac.nz Fitting Hyperbolic Distributions
More informationECE 275B Homework #2 Due Thursday 2/12/2015. MIDTERM is Scheduled for Thursday, February 19, 2015
Reading ECE 275B Homework #2 Due Thursday 2/12/2015 MIDTERM is Scheduled for Thursday, February 19, 2015 Read and understand the Newton-Raphson and Method of Scores MLE procedures given in Kay, Example
More informationExpectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )
Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select
More informationLecture 3 September 1
STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have
More informationChapter 08: Direct Maximum Likelihood/MAP Estimation and Incomplete Data Problems
LEARNING AND INFERENCE IN GRAPHICAL MODELS Chapter 08: Direct Maximum Likelihood/MAP Estimation and Incomplete Data Problems Dr. Martin Lauer University of Freiburg Machine Learning Lab Karlsruhe Institute
More informationMAXIMUM LIKELIHOOD ESTIMATION OF FACTOR ANALYSIS USING THE ECME ALGORITHM WITH COMPLETE AND INCOMPLETE DATA
Statistica Sinica 8(1998), 79-747 MAXIMUM LIKELIHOOD ESTIMATION OF FACTOR ANALYSIS USING THE ECME ALGORITHM WITH COMPLETE AND INCOMPLETE DATA Chuanhai Liu and Donald B. Rubin Bell Labs and Harvard University
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationTheory of Statistical Tests
Ch 9. Theory of Statistical Tests 9.1 Certain Best Tests How to construct good testing. For simple hypothesis H 0 : θ = θ, H 1 : θ = θ, Page 1 of 100 where Θ = {θ, θ } 1. Define the best test for H 0 H
More informationInvariant HPD credible sets and MAP estimators
Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in
More informationf X (y, z; θ, σ 2 ) = 1 2 (2πσ2 ) 1 2 exp( (y θz) 2 /2σ 2 ) l c,n (θ, σ 2 ) = i log f(y i, Z i ; θ, σ 2 ) (Y i θz i ) 2 /2σ 2
Chapter 7: EM algorithm in exponential families: JAW 4.30-32 7.1 (i) The EM Algorithm finds MLE s in problems with latent variables (sometimes called missing data ): things you wish you could observe,
More informationTHE NESTED DIRICHLET DISTRIBUTION AND INCOMPLETE CATEGORICAL DATA ANALYSIS
Statistica Sinica 19 (2009), 251-271 THE NESTED DIRICHLET DISTRIBUTION AND INCOMPLETE CATEGORICAL DATA ANALYSIS Kai Wang Ng 1, Man-Lai Tang 2, Guo-Liang Tian 1 and Ming Tan 3 1 The University of Hong Kong,
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationA Note on the Expectation-Maximization (EM) Algorithm
A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization
More informationAn Akaike Criterion based on Kullback Symmetric Divergence in the Presence of Incomplete-Data
An Akaike Criterion based on Kullback Symmetric Divergence Bezza Hafidi a and Abdallah Mkhadri a a University Cadi-Ayyad, Faculty of sciences Semlalia, Department of Mathematics, PB.2390 Marrakech, Moroco
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationFitting Narrow Emission Lines in X-ray Spectra
Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007 Outline of Presentation Outline This talk has three components:
More informationNon-Stochastic EM Algorithms in Optimization
Non-Stochastic EM Algorithms in Optimization Charles Byrne August 30, 2013 Abstract We consider the problem of maximizing a non-negative function f : Z R, where Z is an arbitrary set. We assume that there
More informationMachine Learning for Signal Processing Expectation Maximization Mixture Models. Bhiksha Raj 27 Oct /
Machine Learning for Signal rocessing Expectation Maximization Mixture Models Bhiksha Raj 27 Oct 2016 11755/18797 1 Learning Distributions for Data roblem: Given a collection of examples from some data,
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationConvergence Properties of a Sequential Regression Multiple Imputation Algorithm
Convergence Properties of a Sequential Regression Multiple Imputation Algorithm Jian Zhu 1, Trivellore E. Raghunathan 2 Department of Biostatistics, University of Michigan, Ann Arbor Abstract A sequential
More information1.5 MM and EM Algorithms
72 CHAPTER 1. SEQUENCE MODELS 1.5 MM and EM Algorithms The MM algorithm [1] is an iterative algorithm that can be used to minimize or maximize a function. We focus on using it to maximize the log likelihood
More informationExpectation Maximization Mixture Models HMMs
11-755 Machine Learning for Signal rocessing Expectation Maximization Mixture Models HMMs Class 9. 21 Sep 2010 1 Learning Distributions for Data roblem: Given a collection of examples from some data, estimate
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationANALYSIS OF TWO-LEVEL STRUCTURAL EQUATION MODELS VIA EM TYPE ALGORITHMS
Statistica Sinica 8(1998), 749-766 ANALYSIS OF TWO-LEVEL STRUCTURAL EQUATION MODELS VIA EM TYPE ALGORITHMS Sik-Yum Lee and Wai-Yin Poon Chinese University of Hong Kong Abstract: In this paper, the maximum
More informationECE 275B Homework #2 Due Thursday MIDTERM is Scheduled for Tuesday, February 21, 2012
Reading ECE 275B Homework #2 Due Thursday 2-16-12 MIDTERM is Scheduled for Tuesday, February 21, 2012 Read and understand the Newton-Raphson and Method of Scores MLE procedures given in Kay, Example 7.11,
More informationSTATISTICAL INFERENCE WITH DATA AUGMENTATION AND PARAMETER EXPANSION
STATISTICAL INFERENCE WITH arxiv:1512.00847v1 [math.st] 2 Dec 2015 DATA AUGMENTATION AND PARAMETER EXPANSION Yannis G. Yatracos Faculty of Communication and Media Studies Cyprus University of Technology
More informationThe EM Algorithm: Theory, Applications and Related Methods
Charles L. Byrne Department of Mathematical Sciences University of Massachusetts Lowell March 1, 2017 The EM Algorithm: Theory, Applications and Related Methods Contents Preface vii 1 Introduction 1 1.1
More informationEM Algorithm. Expectation-maximization (EM) algorithm.
EM Algorithm Outline: Expectation-maximization (EM) algorithm. Examples. Reading: A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc.,
More informationStreamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level
Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level A Monte Carlo Simulation to Test the Tenability of the SuperMatrix Approach Kyle M Lang Quantitative Psychology
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Prof. C. F. Jeff Wu ISyE 8813 Section 1 Motivation What is parameter estimation? A modeler proposes a model M(θ) for explaining some observed phenomenon θ are the parameters
More informationRejoinder. David R. Hunter and Kenneth Lange
Rejoinder David R. Hunter and Kenneth Lange What s in a Name? We thank the discussants for their insightful and substantive comments. In replying, we open with the least substantive issue the name of the
More informationLikelihood-based inference with missing data under missing-at-random
Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationThe Expectation Maximization Algorithm
The Expectation Maximization Algorithm Frank Dellaert College of Computing, Georgia Institute of Technology Technical Report number GIT-GVU-- February Abstract This note represents my attempt at explaining
More informationPh.D. Qualifying Exam Friday Saturday, January 3 4, 2014
Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently
More informationMixtures of Rasch Models
Mixtures of Rasch Models Hannah Frick, Friedrich Leisch, Achim Zeileis, Carolin Strobl http://www.uibk.ac.at/statistics/ Introduction Rasch model for measuring latent traits Model assumption: Item parameters
More informationSeries 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)
Exercises Introduction to Machine Learning SS 2018 Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) LAS Group, Institute for Machine Learning Dept of Computer Science, ETH Zürich Prof
More informationChapter 4. Theory of Tests. 4.1 Introduction
Chapter 4 Theory of Tests 4.1 Introduction Parametric model: (X, B X, P θ ), P θ P = {P θ θ Θ} where Θ = H 0 +H 1 X = K +A : K: critical region = rejection region / A: acceptance region A decision rule
More informationLecture 6: April 19, 2002
EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin
More informationComputing the MLE and the EM Algorithm
ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Computing the MLE and the EM Algorithm If X p(x θ), θ Θ, then the MLE is the solution to the equations logp(x θ) θ 0. Sometimes these equations
More informationMaximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS
Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Maximum likelihood (ML) Priors, and
More information1 One-way analysis of variance
LIST OF FORMULAS (Version from 21. November 2014) STK2120 1 One-way analysis of variance Assume X ij = µ+α i +ɛ ij ; j = 1, 2,..., J i ; i = 1, 2,..., I ; where ɛ ij -s are independent and N(0, σ 2 ) distributed.
More information1 Expectation Maximization
Introduction Expectation-Maximization Bibliographical notes 1 Expectation Maximization Daniel Khashabi 1 khashab2@illinois.edu 1.1 Introduction Consider the problem of parameter learning by maximizing
More informationMaximum likelihood estimation via the ECM algorithm: A general framework
Biometrika (1993), 80, 2, pp. 267-78 Printed in Great Britain Maximum likelihood estimation via the ECM algorithm: A general framework BY XIAO-LI MENG Department of Statistics, University of Chicago, Chicago,
More informationf(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain
0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher
More informationAn introduction to Variational calculus in Machine Learning
n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area
More informationMotif representation using position weight matrix
Motif representation using position weight matrix Xiaohui Xie University of California, Irvine Motif representation using position weight matrix p.1/31 Position weight matrix Position weight matrix representation
More informationLecture 8: Graphical models for Text
Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/
More informationMaximum likelihood in log-linear models
Graphical Models, Lecture 4, Michaelmas Term 2010 October 22, 2010 Generating class Dependence graph of log-linear model Conformal graphical models Factor graphs Let A denote an arbitrary set of subsets
More informationENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM
c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With
More informationESTIMATION ALGORITHMS
ESTIMATIO ALGORITHMS Solving normal equations using QR-factorization on-linear optimization Two and multi-stage methods EM algorithm FEL 3201 Estimation Algorithms - 1 SOLVIG ORMAL EQUATIOS USIG QR FACTORIZATIO
More informationThe Dynamic ECME Algorithm
The Dynamic ECME Algorithm Yunxiao He Yale University, New Haven, USA Chuanhai Liu Purdue University, West Lafayette, USA Summary. The Expectation/Conditional Maximisation Either (ECME) algorithm has proven
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationCSE446: Clustering and EM Spring 2017
CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled
More informationAnalysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates
Communications in Statistics - Theory and Methods ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 Analysis of Gamma and Weibull Lifetime Data under a
More informationLikelihood-Based Methods
Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)
More informationNormalising constants and maximum likelihood inference
Normalising constants and maximum likelihood inference Jakob G. Rasmussen Department of Mathematics Aalborg University Denmark March 9, 2011 1/14 Today Normalising constants Approximation of normalising
More informationGibbs Sampling Methods for Multiple Sequence Alignment
Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical
More informationInference based on the em algorithm for the competing risk model with masked causes of failure
Inference based on the em algorithm for the competing risk model with masked causes of failure By RADU V. CRAIU Department of Statistics, University of Toronto, 100 St. George Street, Toronto Ontario,
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationExpectation Maximization
Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could
More informationSTAT Advanced Bayesian Inference
1 / 8 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics March 5, 2018 Distributional approximations 2 / 8 Distributional approximations are useful for quick inferences, as starting
More informationVariables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z)
CSC2515 Machine Learning Sam Roweis Lecture 8: Unsupervised Learning & EM Algorithm October 31, 2006 Partially Unobserved Variables 2 Certain variables q in our models may be unobserved, either at training
More informationSQUAREM. An R package for Accelerating Slowly Convergent Fixed-Point Iterations Including the EM and MM algorithms.
An R package for Accelerating Slowly Convergent Fixed-Point Iterations Including the EM and MM algorithms Ravi 1 1 Division of Geriatric Medicine & Gerontology Johns Hopkins University Baltimore, MD, USA
More information