Optimization Methods II. EM algorithms.

Size: px
Start display at page:

Download "Optimization Methods II. EM algorithms."

Transcription

1 Aula 7. Optimization Methods II. 0 Optimization Methods II. EM algorithms. Anatoli Iambartsev IME-USP

2 Aula 7. Optimization Methods II. 1 [RC] Missing-data models. Demarginalization. The term EM algorithms has been around for a long time by [DLR]. Consider the case where the density of the observations can be expresses as g θ (x) = f θ (x, z)dz, g θ (x) f θ (x, z). This representation occurs in many statistical settings, including censoring models and mixtures and latent variable models (tobit, probit, arch, stochastic volatility, ect.).

3 Aula 7. Optimization Methods II. 2 [RC] Example 5.12 The mixture model of Example 5.2 (see previous lecture), 0.25N(µ 1, 1) N(µ 2, 1), can be expressed as a missing-data model even though the (observed) likelihood can be computed in a manageable time. Indeed, if we introduce a vector z = (z 1,..., z n ) {1, 2} n in addition to the sample x = (x 1,..., x n ) such that P θ (Z i = 1) = 1 P θ (Z i = 2) = 0.25, X i Z i = z N(µ z, 1). we recover the mixture model from the Example 5.2 as the marginal distribution of X i. The (observed) likelihood is then obtained as E(H(x, z)) for { 1 H(x, z) 4 exp (x } i µ 1 ) 2 { exp (x } i µ 2 ) 2. 2 i:z i =1 i:z i =2

4 Aula 7. Optimization Methods II. 3 [RC] Example 5.12 The (observed) likelihood is then obtained as E(H(x, z)) for } } H(x, z) Indeed, i:z i =1 g µ1,µ 2 (x) = { 1 4 exp = = (x i µ 1 ) 2 2 i:z i =2 f µ1,µ 2 (x, z)dz = z {1,2} n i:z i =1 n ( 1 4 i=1 1 4 e (x i µ 1 )2 2 2π e (x i µ 1 ) π 4 { 3 4 exp (x i µ 2 ) 2 2 H(x, z) ( 2π) n z {1,2} n i:z i =2 3 4 e (x i µ 2 )2 2 2π ). e (x i µ 2 )2 2 2π :

5 Aula 7. Optimization Methods II. 4 [RC] Example 5.13 Censored data may come from experiments where some potential observations are replaced with a lower bound because they take too long to observe. Suppose that we observe Y 1,..., Y m, iid, from f(y θ) and that the (n m) remaining (Y m+1,..., Y n ) are censored at the threshold a. The corresponding likelihood function is then L(θ y) = ( 1 F (a θ) ) n m m i=1 f(y i θ), where F is the cdf associated with f and y = (y 1,..., y m ).

6 Aula 7. Optimization Methods II. 5 [RC] Example 5.13 L(θ y) = ( 1 F (a θ) ) n m m i=1 f(y i θ). If we had observed the last n m values, say z = (z m+1,..., z n ), with z i a(i = m + 1,..., n), we could have constructed the (complete data) likelihood L c (θ y, z) = m i=1 f(y i θ) n i=m+1 f(z i θ). Note that L(θ y) = E ( L c (θ y, Z) ) = L c (θ y, z)f(z y, θ)dz, where f(z y, θ) is the density of the missing data conditional on the observed data, namely the product of the f(z i θ)/(1 F (a θ)) s; i.e., f(z θ) restricted to (a, + ).

7 Aula 7. Optimization Methods II. 6 Main Idea of EM algorithms Demarginalization: g θ (x) f θ (x, z), g θ (x) = f θ (x, z)dz. A values from z can be generated by the conditional distribution k θ (z x) = f θ(x, z) g θ (x). Take a logarithm log g θ (x) = log f θ (x, z) log k θ (z x). In notations of likelihood function log L(θ x) = log L c (θ x, z) log k θ (z x), where L c stands for complete likelihood function.

8 Aula 7. Optimization Methods II. 7 Main Idea of EM algorithms log L(θ x) = log L c (θ x, z) log k θ (z x), Let us fix a value θ 0 and calculate the expectation according to the distribution k θ0 (z x): log L(θ x) = E k,θ0 log L c (θ x, z) E k,θ0 log k θ (z x) =: Q(θ θ 0, x) H(θ θ 0, x). Theorem. Let θ 1 maximizes the Q, i.e., Then Q(θ 1 θ 0, x) = max θ Q(θ θ 0, x). log L(θ 1 x) log L(θ 0 x).

9 Aula 7. Optimization Methods II. 8 Main Idea of EM algorithms. Proof. Q(θ 1 θ 0, x) = max θ Proof. Q(θ θ 0, x) log L(θ 1 x) log L(θ 0 x). log L(θ 1 x) log L(θ 0 x) ( Q(θ1 θ 0, x) Q(θ 0 θ 0, x) ) ( H(θ 1 θ 0, x) H(θ 0 θ 0, x) ) Note that by definition of θ 1 Q(θ 1 θ 0, x) Q(θ 0 θ 0, x) 0.

10 Aula 7. Optimization Methods II. 9 Main Idea of EM algorithms. Proof. and log L(θ 1 x) log L(θ 0 x) ( Q(θ1 θ 0, x) Q(θ 0 θ 0, x) ) ( H(θ 1 θ 0, x) H(θ 0 θ 0, x) ) H(θ 1 θ 0, x) H(θ 0 θ 0, x) = E k,θ0 log k θ1 (Z x) E k,θ0 log k θ0 (Z x) = E k,θ0 log k θ 1 (Z x) k θ0 (Z x) log E k θ1 (Z x) k,θ 0 k θ0 (Z x) = log 1 = 0, where Jensen inequality was used E log ξ log Eξ.

11 Aula 7. Optimization Methods II. 10 Main Idea of EM algorithms. Proof. log L(θ 1 x) log L(θ 0 x) ( Q(θ1 θ 0, x) Q(θ 0 θ 0, x) ) ( H(θ 1 θ 0, x) H(θ 0 θ 0, x) ) We have thus Q(θ 1 θ 0, x) Q(θ 0 θ 0, x) 0, H(θ 1 θ 0, x) H(θ 0 θ 0, x) 0, log L(θ 1 x) log L(θ 0 x) 0. This completes the proof of the theorem.

12 Aula 7. Optimization Methods II. 11 Main Idea of EM algorithms. Each iteration EM algorithm maximizes a function Q. Let θ t be a sequence obtained recurcively Q(θ t+1 θ t, x) = max θ Q(θ θ t, x). This recurrent scheme consists on two steps expectation and maximization, that gives the name for the scheme: EM algorithm.

13 Aula 7. Optimization Methods II. 12 EM algorithm. Choose the initial parameter θ 0 and repeat: E-step. Calculate the expectation Q(θ θ t, x) = E k,θt log L c (θ x, Z) with respect to the distribution k θt (z x). M-step. Maximize Q(θ θ t, x) on θ and determine the next value θ t+1 = arg max θ Q(θ θ t, x), define t = t + 1, return to the E-step

14 Aula 7. Optimization Methods II. 13 [RC]. Example 5.14 (Cont. of Example 5.13) Let again Y 1,..., Y m are iid com density f(y θ) and others Y m+1,..., Y n are censured at the level a. The likelihood function L(θ y) = ( 1 F (a θ) ) n m m i=1 f(y i θ), where F (a θ) = P(Y i a). If we had observed the last n m values, say z = (z m+1,..., z n ), with z i a(i = m + 1,..., n), we could have constructed the (complete data) likelihood L c (θ y, z) = m i=1 f(y i θ) n i=m+1 f(z i θ). and k θ (z y) = n m i=1 f(z i θ) 1 F (a θ).

15 Aula 7. Optimization Methods II. 14 [RC]. Example 5.14 (Cont. of Example 5.13) Suppose that f(y θ) corresponds to the N(θ, 1) distribution, the complete-data likelihood is m n L c (θ y, z) e (y i θ) 2 /2 e (z i θ) 2 /2, i=1 i=m+1 resulting in the expected complete-data log-likelihood Q(θ θ 0, y) = 1 m (y i θ) 2 1 n ( E k,θ0 Zi θ) 2), 2 2 i=1 i=m+1 where the missing observations Z i are distributed from a normal N(θ, 1) distribution truncated in a. Doing the M-step (i.e., differentiating the function Q(θ θ 0, y) in θ) and setting it equal to 0 then leads to the EM update ˆθ = mȳ + (n m)e k,θ 0 (Z 1 ). n

16 Aula 7. Optimization Methods II. 15 [RC]. Example 5.14 (Cont. of Example 5.13) Doing the M-step... the EM update ˆθ = mȳ + (n m)e k,θ 0 (Z 1 ). n Since E k,θ0 (Z 1 ) = θ + φ(a θ), where φ and Φ are the 1 Φ(a θ) normal pdf and cdf, respectively, the EM sequence is θ t+1 = m n ȳ + n m ( θ t + φ(a θ ) t). n 1 Φ(a θ t )

17 Aula 7. Optimization Methods II. 16 Principle of missing information (informally) log L(θ x) = log L c (θ x, z) log k θ (z x) 2 log L(θ x) = 2 log L c (θ x, z) + 2 log k θ (z x) θ 2 θ 2 θ 2 Observed information = Complete information Missing information Informally about a convergence rate of EM algorithms: if a proportion of missing information increases with iterations, then a rate of convergence of an algorithm decreases.

18 Aula 7. Optimization Methods II. 17 EM algorithms for exponential family models. It is more easy to implement an algorithm when a complete data (z, x) has the exponential family type of distribution in a canonic form: p(z, x θ) = b(z, x) exp(θt s(z, x)) a(θ) Let y = (z, x) be a vector of complete data. We have log p(y θ) = log b(y) + θ T s(y) log a(θ) 1 a(θ) log p(y θ) = s(y) θ a(θ) θ Remember that a(θ) = b(y) exp(θ T s(y))dy

19 Aula 7. Optimization Methods II. 18 EM algorithms for exponential family models. θ log p(y θ) = s(y) 1 a(θ) a(θ) θ, a(θ) = b(y) exp(θ T s(y))dy we have log a(θ) θ = = = 1 a(θ) 1 a(θ) a(θ) θ = 1 a(θ) b(y)s(y) exp(θ T s(y))dy b(y) exp(θt s(y)) dy θ s(y) b(y) exp(θt s(y)) dy = E(s(y) θ) a(θ) Thus, log p(y θ) = 0 s(y) = E(s(y) θ) θ

20 Aula 7. Optimization Methods II. 19 EM algorithms for exponential family models. Implementation of EM algorithm: E-step Q(θ θ t ) = = log p(z, x θ)p(z θ t, x)dz b(z, x)p(z θ t, x)dz + θ T s(z, x)p(z θ t, x)dz log a(θ) Note that the first term will not participate in M-step. M-step: we obtain extreme point Q(θ θ t ) θ = s(z, x)p(z θ t, x)dz 1 a(θ) a(θ) θ = E(s(z, x) θ t, x) E(s(z, x) θ) Remember that E(s(z, x) θ) = 1 a(θ) a(θ) θ

21 Aula 7. Optimization Methods II. 20 EM algorithms for exponential family models. Thus the maximization of Q(θ θ t, x) in M-step is equivalent to solve the following equation E(s(z, x) θ t, x) = E(s(z, x) θ) If the solution exists, then it is unique.

22 Aula 7. Optimization Methods II. 21 Monte Carlo for E-step. Given θ t we need to calculate Q(θ θ t, x) = E k,θt log L c (θ Z, x). When it is difficult to calculate explicitly we can calculate approximately using Monte Carlo: 1. generate z 1,..., z m according k θ (z x); 2. calculate Q(θ θ t ) = 1 m m i=1 log Lc (θ z i, x); during M-step one maximizes Q by θ in order to obtain θ t+1.

23 Aula 7. Optimization Methods II. 22 EM algorithm. [H] Hunter, D.R. On the Geometry of EM algorithms.: this paper demonstrates how the geometry of EM algorithms can help explain how their rate of convergence is related to the proportion of missing data. In footnote [H] wrote: In a footnote, [DLR] refer to the comment of a referee, who noted that the use of the word algorithm may be criticized since EM is not, strictly speaking, an algorithm. However, EM is a recipe for creating algorithms, and thus we consider the set of EM algorithms to consist of all algorithms baked according to the EM recipe.

24 Aula 7. Optimization Methods II. 23 References. [DLR] Dempster, A.P., Laird, N.M., and Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm, J.Roy. Statist. Soc. Ser. B, 39, 1-38, [H] David R. Hunter. On the Geometry of EM algorithms. Technical Report 0303, Dep. of Stat., Penn State University. February, [RC ] Cristian P. Robert and George Casella. Introducing Monte Carlo Methods with R. Series Use R!. Springer

EM for ML Estimation

EM for ML Estimation Overview EM for ML Estimation An algorithm for Maximum Likelihood (ML) Estimation from incomplete data (Dempster, Laird, and Rubin, 1977) 1. Formulate complete data so that complete-data ML estimation

More information

EM Algorithm. Expectation-maximization (EM) algorithm.

EM Algorithm. Expectation-maximization (EM) algorithm. EM Algorithm Outline: Expectation-maximization (EM) algorithm. Examples. Reading: A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc.,

More information

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction An approximated EM algorithm Simulation studies Discussion 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Markov Chain Monte Carlo. Simulated Annealing.

Markov Chain Monte Carlo. Simulated Annealing. Aula 10. Simulated Annealing. 0 Markov Chain Monte Carlo. Simulated Annealing. Anatoli Iambartsev IME-USP Aula 10. Simulated Annealing. 1 [RC] Stochastic search. General iterative formula for optimizing

More information

Chapter 3 : Likelihood function and inference

Chapter 3 : Likelihood function and inference Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM

More information

1 Expectation Maximization

1 Expectation Maximization Introduction Expectation-Maximization Bibliographical notes 1 Expectation Maximization Daniel Khashabi 1 khashab2@illinois.edu 1.1 Introduction Consider the problem of parameter learning by maximizing

More information

Estimating the parameters of hidden binomial trials by the EM algorithm

Estimating the parameters of hidden binomial trials by the EM algorithm Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

Computing the MLE and the EM Algorithm

Computing the MLE and the EM Algorithm ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Computing the MLE and the EM Algorithm If X p(x θ), θ Θ, then the MLE is the solution to the equations logp(x θ) θ 0. Sometimes these equations

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 ) Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

CSC411 Fall 2018 Homework 5

CSC411 Fall 2018 Homework 5 Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to

More information

Monte Carlo Integration I [RC] Chapter 3

Monte Carlo Integration I [RC] Chapter 3 Aula 3. Monte Carlo Integration I 0 Monte Carlo Integration I [RC] Chapter 3 Anatoli Iambartsev IME-USP Aula 3. Monte Carlo Integration I 1 There is no exact definition of the Monte Carlo methods. In the

More information

A Quantile Implementation of the EM Algorithm and Applications to Parameter Estimation with Interval Data

A Quantile Implementation of the EM Algorithm and Applications to Parameter Estimation with Interval Data A Quantile Implementation of the EM Algorithm and Applications to Parameter Estimation with Interval Data Chanseok Park The author is with the Department of Mathematical Sciences, Clemson University, Clemson,

More information

Finite Singular Multivariate Gaussian Mixture

Finite Singular Multivariate Gaussian Mixture 21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate

More information

New mixture models and algorithms in the mixtools package

New mixture models and algorithms in the mixtools package New mixture models and algorithms in the mixtools package Didier Chauveau MAPMO - UMR 6628 - Université d Orléans 1ères Rencontres BoRdeaux Juillet 2012 Outline 1 Mixture models and EM algorithm-ology

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

f X (y, z; θ, σ 2 ) = 1 2 (2πσ2 ) 1 2 exp( (y θz) 2 /2σ 2 ) l c,n (θ, σ 2 ) = i log f(y i, Z i ; θ, σ 2 ) (Y i θz i ) 2 /2σ 2

f X (y, z; θ, σ 2 ) = 1 2 (2πσ2 ) 1 2 exp( (y θz) 2 /2σ 2 ) l c,n (θ, σ 2 ) = i log f(y i, Z i ; θ, σ 2 ) (Y i θz i ) 2 /2σ 2 Chapter 7: EM algorithm in exponential families: JAW 4.30-32 7.1 (i) The EM Algorithm finds MLE s in problems with latent variables (sometimes called missing data ): things you wish you could observe,

More information

On the Geometry of EM algorithms

On the Geometry of EM algorithms On the Geometry of EM algorithms David R. Hunter Technical report no. 0303 Department of Statistics Penn State University University Park, PA 16802-2111 email: dhunter@stat.psu.edu phone: (814) 863-0979

More information

Parameter Estimation of Power Lomax Distribution Based on Type-II Progressively Hybrid Censoring Scheme

Parameter Estimation of Power Lomax Distribution Based on Type-II Progressively Hybrid Censoring Scheme Applied Mathematical Sciences, Vol. 12, 2018, no. 18, 879-891 HIKARI Ltd, www.m-hikari.com https://doi.org/10.12988/ams.2018.8691 Parameter Estimation of Power Lomax Distribution Based on Type-II Progressively

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

MCMC Simulated Annealing Exercises.

MCMC Simulated Annealing Exercises. Aula 10. Simulated Annealing. 0 MCMC Simulated Annealing Exercises. Anatoli Iambartsev IME-USP Aula 10. Simulated Annealing. 1 [Wiki] Salesman problem. The travelling salesman problem (TSP), or, in recent

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Expectation maximization tutorial

Expectation maximization tutorial Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed

More information

Issues on Fitting Univariate and Multivariate Hyperbolic Distributions

Issues on Fitting Univariate and Multivariate Hyperbolic Distributions Issues on Fitting Univariate and Multivariate Hyperbolic Distributions Thomas Tran PhD student Department of Statistics University of Auckland thtam@stat.auckland.ac.nz Fitting Hyperbolic Distributions

More information

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Lecture 6: April 19, 2002

Lecture 6: April 19, 2002 EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin

More information

Theory of Statistical Tests

Theory of Statistical Tests Ch 9. Theory of Statistical Tests 9.1 Certain Best Tests How to construct good testing. For simple hypothesis H 0 : θ = θ, H 1 : θ = θ, Page 1 of 100 where Θ = {θ, θ } 1. Define the best test for H 0 H

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center, the Netherlands d.rizopoulos@erasmusmc.nl

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

The Expectation Maximization Algorithm

The Expectation Maximization Algorithm The Expectation Maximization Algorithm Frank Dellaert College of Computing, Georgia Institute of Technology Technical Report number GIT-GVU-- February Abstract This note represents my attempt at explaining

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Statistica Sinica 15(2005), 831-840 PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Florin Vaida University of California at San Diego Abstract: It is well known that the likelihood sequence of the EM algorithm

More information

Stat 451 Lecture Notes Monte Carlo Integration

Stat 451 Lecture Notes Monte Carlo Integration Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate

More information

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis

More information

EM for Spherical Gaussians

EM for Spherical Gaussians EM for Spherical Gaussians Karthekeyan Chandrasekaran Hassan Kingravi December 4, 2007 1 Introduction In this project, we examine two aspects of the behavior of the EM algorithm for mixtures of spherical

More information

An introduction to Variational calculus in Machine Learning

An introduction to Variational calculus in Machine Learning n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area

More information

Censoring mechanisms

Censoring mechanisms Censoring mechanisms Patrick Breheny September 3 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Fixed vs. random censoring In the previous lecture, we derived the contribution to the likelihood

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Stat 451 Lecture Notes Numerical Integration

Stat 451 Lecture Notes Numerical Integration Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29

More information

Computational statistics

Computational statistics Computational statistics EM algorithm Thierry Denœux February-March 2017 Thierry Denœux Computational statistics February-March 2017 1 / 72 EM Algorithm An iterative optimization strategy motivated by

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

A Note on the Expectation-Maximization (EM) Algorithm

A Note on the Expectation-Maximization (EM) Algorithm A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization

More information

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Lectures on Machine Learning (Fall 2017) Hyeong In Choi Seoul National University Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Topics to be covered:

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

Hmms with variable dimension structures and extensions

Hmms with variable dimension structures and extensions Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating

More information

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models Jaemo Sung 1, Sung-Yang Bang 1, Seungjin Choi 1, and Zoubin Ghahramani 2 1 Department of Computer Science, POSTECH,

More information

Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models.

Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models. Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.com This Lecture Expectation-Maximization (EM)

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could

More information

Parameter Estimation

Parameter Estimation Parameter Estimation Chapters 13-15 Stat 477 - Loss Models Chapters 13-15 (Stat 477) Parameter Estimation Brian Hartman - BYU 1 / 23 Methods for parameter estimation Methods for parameter estimation Methods

More information

Lecture 1: Supervised Learning

Lecture 1: Supervised Learning Lecture 1: Supervised Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech ISYE6740/CSE6740/CS7641: Computational Data Analysis/Machine from Portland, Learning Oregon: pervised learning (Supervised)

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

Expectation Maximization and Mixtures of Gaussians

Expectation Maximization and Mixtures of Gaussians Statistical Machine Learning Notes 10 Expectation Maximiation and Mixtures of Gaussians Instructor: Justin Domke Contents 1 Introduction 1 2 Preliminary: Jensen s Inequality 2 3 Expectation Maximiation

More information

Definition 1.1 (Parametric family of distributions) A parametric distribution is a set of distribution functions, each of which is determined by speci

Definition 1.1 (Parametric family of distributions) A parametric distribution is a set of distribution functions, each of which is determined by speci Definition 1.1 (Parametric family of distributions) A parametric distribution is a set of distribution functions, each of which is determined by specifying one or more values called parameters. The number

More information

ECE 275B Homework # 1 Solutions Version Winter 2015

ECE 275B Homework # 1 Solutions Version Winter 2015 ECE 275B Homework # 1 Solutions Version Winter 2015 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

Introduction to Bayesian Computation

Introduction to Bayesian Computation Introduction to Bayesian Computation Dr. Jarad Niemi STAT 544 - Iowa State University March 20, 2018 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 1 / 30 Bayesian computation

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

Computational Statistics and Data Analysis. Estimation for the three-parameter lognormal distribution based on progressively censored data

Computational Statistics and Data Analysis. Estimation for the three-parameter lognormal distribution based on progressively censored data Computational Statistics and Data Analysis 53 (9) 358 359 Contents lists available at ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda stimation for

More information

STAT 514 Solutions to Assignment #6

STAT 514 Solutions to Assignment #6 STAT 514 Solutions to Assignment #6 Question 1: Suppose that X 1,..., X n are a simple random sample from a Weibull distribution with density function f θ x) = θcx c 1 exp{ θx c }I{x > 0} for some fixed

More information

Bayesian Inference by Density Ratio Estimation

Bayesian Inference by Density Ratio Estimation Bayesian Inference by Density Ratio Estimation Michael Gutmann https://sites.google.com/site/michaelgutmann Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh

More information

Outline lecture 6 2(35)

Outline lecture 6 2(35) Outline lecture 35 Lecture Expectation aximization E and clustering Thomas Schön Division of Automatic Control Linöping University Linöping Sweden. Email: schon@isy.liu.se Phone: 13-1373 Office: House

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 2013 Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 27 Course arrangements Lectures M.2

More information

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling 1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models

Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models Benjamin M. Marlin Mohammad Emtiyaz Khan Kevin P. Murphy University of British Columbia, Vancouver, BC, Canada V6T Z4 bmarlin@cs.ubc.ca

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

ECE 275B Homework # 1 Solutions Winter 2018

ECE 275B Homework # 1 Solutions Winter 2018 ECE 275B Homework # 1 Solutions Winter 2018 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2 < < x n Thus,

More information

The Expectation Maximization or EM algorithm

The Expectation Maximization or EM algorithm The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,

More information

Stat 710: Mathematical Statistics Lecture 27

Stat 710: Mathematical Statistics Lecture 27 Stat 710: Mathematical Statistics Lecture 27 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 27 April 3, 2009 1 / 10 Lecture 27:

More information

The EM Algorithm for the Finite Mixture of Exponential Distribution Models

The EM Algorithm for the Finite Mixture of Exponential Distribution Models Int. J. Contemp. Math. Sciences, Vol. 9, 2014, no. 2, 57-64 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijcms.2014.312133 The EM Algorithm for the Finite Mixture of Exponential Distribution

More information

Mixtures of Rasch Models

Mixtures of Rasch Models Mixtures of Rasch Models Hannah Frick, Friedrich Leisch, Achim Zeileis, Carolin Strobl http://www.uibk.ac.at/statistics/ Introduction Rasch model for measuring latent traits Model assumption: Item parameters

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Answer Key for STAT 200B HW No. 7

Answer Key for STAT 200B HW No. 7 Answer Key for STAT 200B HW No. 7 May 5, 2007 Problem 2.2 p. 649 Assuming binomial 2-sample model ˆπ =.75, ˆπ 2 =.6. a ˆτ = ˆπ 2 ˆπ =.5. From Ex. 2.5a on page 644: ˆπ ˆπ + ˆπ 2 ˆπ 2.75.25.6.4 = + =.087;

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun Y. LeCun: Machine Learning and Pattern Recognition p. 1/? MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun The Courant Institute, New York University http://yann.lecun.com

More information

EM-algorithm for motif discovery

EM-algorithm for motif discovery EM-algorithm for motif discovery Xiaohui Xie University of California, Irvine EM-algorithm for motif discovery p.1/19 Position weight matrix Position weight matrix representation of a motif with width

More information

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Neural Networks and Learning Machines (Haykin) Lecture Notes on Self-learning Neural Algorithms Byoung-Tak Zhang School of Computer Science

More information

Rare-Event Simulation

Rare-Event Simulation Rare-Event Simulation Background: Read Chapter 6 of text. 1 Why is Rare-Event Simulation Challenging? Consider the problem of computing α = P(A) when P(A) is small (i.e. rare ). The crude Monte Carlo estimator

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information