Communications in Statistics - Simulation and Computation. Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study

Size: px
Start display at page:

Download "Communications in Statistics - Simulation and Computation. Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study"

Transcription

1 Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study Journal: Manuscript ID: LSSP-00-0.R Manuscript Type: Original Paper Date Submitted by the Author: -May-0 Complete List of Authors: Faria, Susana; University of Minho, Department of Mathematics and Applications Soromenho, Gilda; University of Lisbon Keywords: Maximum likelihood estimation, EM algorithm, Stochastic EM algorithm, Mixture Poisson regression models, Simulation study Abstract: In this work, we propose to compare two algorithms to compute maximum likelihood estimates for the parameters of a mixture Poisson regression model: the EM algorithm and the Stochastic EM algorithm. The comparison of the two procedures was done through a simulation study of the performance of these approaches on simulated data sets and real data sets. Simulation results show that the choice of the approach depends essentially on the overlap of the regression lines. In the real data case, we show that the Stochastic EM algorithm resulted in model estimates that best fit the regression model. Note: The following files were submitted by the author for peer review, but cannot be converted to PDF. You must view these files (e.g. movies online. sfariasoromenho.zip

2 Page of

3 Page of Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study Susana Faria,Gilda Soromenho Department of Mathematics and Applications, University of Minho, 00-0 Guimarães, Portugal, sfaria@math.uminho.pt Institute of Education, University of Lisbon, Portugal, gspereira@ie.ul.pt Abstract: In this work, we propose to compare two algorithms to compute maximum likelihood estimates for the parameters of a mixture Poisson regression model: the EM algorithm and the Stochastic EM algorithm. The comparison of the two procedures was done through a simulation study of the performance of these approaches on simulated data sets and real data sets. Simulation results show that the choice of the approach depends essentially on the overlap of the regression lines. In the real data case, we show that the Stochastic EM algorithm resulted in model estimates that best fit the regression model. Keywords: Maximum likelihood estimation, EM algorithm, Stochastic EM algorithm, Mixture Poisson regression models, Simulation study Introduction Finite mixture models are a well-known method for modelling data that arise from a heterogeneous population (see e.g. McLachlan et al., 000 and Fruhwirth-Schnatter, 00 for a review. The

4 Page of study of these models is a well-established and active area of statistical research and mixtures of regressions have also been studied fairly extensively. In particular, Poisson mixture regression models are commonly used to analyze heterogeneous count data. Wedel et al. ( proposed a class Poisson regression model and an EM algorithm for estimation was described. Wang et al. ( studied mixed Poisson regression models and maximum likelihood estimates of the parameters were obtained by combining EM and quasi-newton algorithms. In this work, we study the procedure for fitting Poisson mixture regression models by means of maximum likelihood (ML. We apply two maximization algorithms to obtain the maximum likelihood estimates: the Expectation Maximization (EM algorithm (Dempster et al., and the Stochastic Expectation Maximization (SEM algorithm (Celeux and Diebolt,. The comparison of EM and SEM approaches in a mixture of distributions is well known. Celeux et al. ( have investigated the practical behaviour of these algorithms through intensive Monte Carlo numerical simulations and a real data study. Dias and Wedel (00 have compared EM and SEM algorithms to estimate the parameters of Gaussian mixture model. Faria and Soromenho (00 have performed a simulation study to compare the performance of these two approaches on Gaussian mixtures of linear regressions. This paper is organized as follows: Section describes the model. Parameter estimation based on the EM algorithm and the Stochastic EM algorithm is discussed in Section. Section provides a simulation study investigating the performance of these algorithms for fitting two and three component mixtures of Poisson regression models. We also study the performance of algorithms in real data sets in section. In Section the conclusions of our study are drawn.

5 Page of Poisson mixture regression models Let the random variable Y i denote the ith response variable, and let observations where y i is the observed value of It is assumed that the marginal distribution of where T and λ ij = exp( β j xi, with f j ( y x Y i and i ( y i, x i, i =, K, n denote x is a (p+-dimensional covariate vector. Y i follows a mixture of Poisson distributions, h( y = J i xi, θ π j f j ( yi xi ( j= ( λ y exp( λij i ij i i =, i =,..., n, j =,..., J yi! T β j = ( β j0, β j,..., β jp denoting the (p+-dimensional vector of regression coefficients for jth component and θ ( π π, β,..., ( =,..., J β J denotes the vector of all parameters. The proportionsπ are the mixing probabilities (0< π <, j =,..., J and π j j J j= j = and can be interpreted as the unconditional probabilities that an individual belongs to component j of the mixture. To be able to reliably estimate the parameters of mixture models we require identifiability. That is, two sets of parameters do not yield the same mixture distribution. Finite mixtures of Poisson distributions are identifiable (see Teicher, 0 for details. Fruhwirth-Schnatter (00 shows that if the covariate matrix is of full rank and the mixing proportions, π, are all different, then the Poisson mixture regression model is identifiable. j

6 Page of Parameter Estimation Among the various estimation methods considered in the literature for finite mixture models, the maximum likelihood (ML has dominated the field. For a given number of J components, the task is to estimate the vector of parameters ( π π, β θ,...,,..., = J β J that maximizes the log-likelihood n J L( θ x,..., xn, y,..., yn = log ( π j f j yi xi i= ( j= The standard tool for finding maximum likelihood solution is the Expectation Maximization (EM algorithm. However, it suffers from slow convergence and may converge to local maxima or saddle points. The Stochastic Expectation Maximization (SEM algorithm is a viable alternative to find the ML estimates of the parameters of a mixture model. The SEM algorithm by using random drawing at each iteration, prevents from being trapped in local optima. It has some advantages over the EM algorithm: it does not get stuck; it often provides more information about the data (see Diebolt and Ip,, for instance when parameters cannot be estimated; and in certain conditions behaves better than EM algorithm (see Celeux et al.,.. The EM algorithm The EM algorithm is a broadly applicable approach to the iterative computation of maximumlikelihood estimates when the observations can be viewed as incomplete data. The idea here is to think of the data as consisting of triples ( yi, x i, zi, i =, K, n where z i T = ( z, K, z, i =, K, n, j =, K J is the unobserved indicator that specifies the mixture compo- i ij

7 Page of nent from which the observation ( y i, x i is drawn, i.e., z ij equals if observation i comes from component j and 0 otherwise. The log likelihood for the complete data is L ( = n J n J x,..., x, y,..., y z log( π + z log( f ( y x θ n n ij j ij j i i ( i= j= i= j= The EM algorithm is easy to program and proceeds iteratively in two steps, E (for expectation and M (for maximization. At the E-step, it replaces the missing data by its expectation conditional on the observed data. At the M-step, it finds the parameter estimates which maximize the expected log likelihood for the complete data, conditional on the expected values of the missing data. zij This procedure can be stated as follows. ( r E-step: Given the current parameter estimates θ in the rth iteration, replace the missing data by the estimated probabilities that the i observation belongs to the jth component of the mixture, estimates ( r w ij ( i i ij r j ( yi xi, β ( r ( r π f y x, β j = J j j= j ( r π f ( M-step: Given the estimates for the probabilities ( r+ θ of the parameters by maximizing Q ( r+ ( ( r θ θ = Q + Q ( under the restriction for the component weights and where ( + = ( n J r Q w ij log π ( i= j= j ij ( ( r w (which are functions of ( r θ, obtain new ij

8 Page of and n J ( r+ Q = wij log f j yi xi, β j ( i= j= J The maximization of Q under the restriction for the component weights, π j =, is equivalent j= to maximizing the function Q J ( r+ ( r+ ( π μ π n J * = w ij log j j i= j= j= where μ is the Lagrangian multiplier. Setting the derivative of function equal to zero yields and Q is maximized separately for each linear models (GLM.. The Stochastic EM algorithm n ( r wij ( r+ ˆ i= π j =, j =, K, J ( n Q with respect to * ( r+ π j =, K, J using weighted ML estimation of generalized We also apply a procedure for fitting Poisson mixture regression models using a stochastic version of the EM algorithm, the so-called SEM algorithm. The SEM algorithm is an improvement of the EM algorithm that incorporates a stochastic step (S-step between the E- and M-steps of EM. Starting from an initial parameter θ 0, an iteration of SEM algorithm consists of three steps. j

9 Page of mixture,, i =, K, n, j =, K J, are computed for the current value of θ as done in the standard EM. E-step: The estimated probabilities that the i observation belongs to the jth component of the ( w r ij, S-step: A partition ( r+ ( ( r+,, ( r+ P = P L PJ of ( y, x, K, ( yn, x n is designed by assigning each observation at random to one of the mixture components according to the multinomial distribution with parameter ( w r ij,, i =, K, n, j =, K J, given by (. If one of the P(r+ is empty or has only one observation, it must be considered that the mixture has J components instead of J and the estimation process begins with J components. Yet, in this case, it provides a bias towards uniform π j parameters. M-step: The ML estimate of θ is updated using the sub-samples ( r+ P. It follows that on the ( r+ M-step of the (r+th iteration, the parameter estimates π j are given by where n j ( r+ ˆ π j =, j =, K, J (0 n n j is the total number of observations arising from component j and the maximization of where { = } * J ( r+ Q = log f j yi xi, β j ( j= { i z = } ij ( r+ i z ij is the set of observations arising from the jth mixture component, gives β j. J

10 Page of Simulation study of algorithm performance. Design of the study To investigate the statistical behaviour of the proposed methods in fitting Poisson mixture regression models, a simulation study was performed. The simulation is designed to evaluate the model performance considering the effects of sample sizes and the initialization of the algorithms as well as the configuration of the regression lines. The scope was limited to the study of two and three components. We used the freeware R to develop the simulation program. Initial Conditions Two different approaches for choosing initial values are compared in the study. In the first strategy, we use the true parameter values of the model by generating the observations as initial values in order to determine the performance of the algorithm in the best case. In the other strategy we ran the algorithm 0 times from random initial position and we selected the solution out of 0 runs which provided the best value of the optimized criterion (Celeux et al.,. Stopping Rules For the EM algorithm, iterations were stopped when the relative change in log-likelihood between two successive iterations were less than 0 0. However, since SEM does not converge pointwise and it generates a Markov chain whose stationary distribution is more or less concentrated around the ML parameter estimate, we used as stopping rule for the SEM algorithm the total number of iterations required for convergence by the EM algorithm. Number of Samples Data set For each type of simulated data set, we generated 00 samples of size n.

11 Page 0 of Each datum ( y i, x i was generated by the following scheme. First, a uniform [0,] random number c i was generated and its value is used to select a particular component j from mixture of regressions model. Next, i x was randomly generated from a uniform [ ] we have λ = exp( β + β x. Finally, we simulate the value y P( λ ij j0 j Measure of Algorithm Performance i x ; distribution and then L x U i ~ ij. In order to examine the performance of two algorithms, we report the Euclidean distance between estimated parameters and true parameter values. Quality of the fit In order to compare the quality of the fit of two algorithms, we report the root mean squared error of prediction (MRSEP: where 00 ( m MRSEP = RMSEP 00 m= ( m RMSEP is the root mean squared error of prediction of the mth replication based on K-fold cross validation, which is given by: where ˆ = J i ˆ π ˆ jλij j= μ and V ( ˆ μ i ( m RMSEP = J J J = ˆ π ˆ + ˆ ˆ j λij π jλ ij j= j= j= n = ( yi ˆ μi n i V ˆ μi ˆ ˆ π jλij For the K-fold cross validation, we have chosen K = and K = 0 ( Hastie el al., 00, p..

12 Page of Simulation results: two component mixture of Poisson regressions For two-component models, samples of four different sizes n ( n = 0,00, 00,000 were generated for each set of true parameter values ( π, β shown on Table (Yang and Lai, 00 and Leisch, 00. For instance, we present in Figure typical scatter plots for samples with size 00. Note that the cases considered correspond to varying degrees of overlapping. Case A has the highest overlapping and data from A show the lowest overlapping. Figure shows boxplots of the Euclidean distance between estimated and true parameters over the 00 replications using the EM and SEM algorithm for fitting two component mixtures of Poisson regression models. Figure shows that the three algorithms have practically the same behaviour. However, when the overlap is high (A EM outperforms SEM by producing estimates of the parameters that have smaller estimation error. As expected, estimation error decreases when the sample size increases. The resulting values of MRSEP based on 0-fold cross validation, for each of the configurations of the true regression lines are plotted in Figure and Figure. Similar results were obtained calculating MRSEP based on -fold cross validation. Figure and Figure show that, in generality, the SEM algorithm performs better than the EM algorithm.. Simulation results: three component mixture of Poisson regressions For three-component models, samples of three different sizes n ( n =00, 00,000 were generated for each set of true parameter values ( π, β shown on Table. Also, the cases considered correspond to varying degrees of overlapping. Cases B, B and B have the highest overlapping and data from B show the lowest overlapping. 0

13 Page of Figure shows boxplots of the Euclidean distance between estimated and true parameters over the 00 replications using the EM and SEM algorithm for fitting three component mixtures of Poisson regression models. Figure shows that EM outperforms SEM by producing estimates of the parameters that have lower estimation error, especially when the overlap is higher (B. Also, as expected, estimation error tends to decrease as the sample size increases. The resulting values of MRSEP based on 0-fold cross validation, for each of the configurations of the true regression lines are shown in Table and Table. Similar results were obtained calculating MRSEP based on -fold cross validation. Table and Table show that, in generality, the SEM algorithm performs better than the EM algorithm. Real Data Sets We now compare the performance of the EM algorithm and the SEM algorithm for fitting Poisson mixture regression models in two real data sets.. Fabric faults The Fabric Faults data set consists of observations of number of faults in rolls of fabric of different length. The dataset is analysed using a finite mixture of Poisson regression models in Aitkin (. The response variable is the number of faults and the covariate is the length of role in meters. The data set can be loaded into R with the command data ( fabricfault, package= flexmix. We fitted a component Poisson mixture regression model using EM algorithm and SEM algorithm where the logarithm of lengths is used as independent variable. The algorithms were initiated by random numbers (second strategy and the stopping criterion was the same used in the simulation

14 Page of study. For each algorithm, the optimal number of components was selected using the proposed procedure: Step : Set j= and calculate the value of the MRSEP based on k-fold cross validation for a two component model. Let this value be denoted by MIN. Step : Set j=j+ and calculate the value of the MRSEP based on k-fold cross validation for a j component model. Step : If the new value of the MRSEP is lower than MIN then set MIN equal to the new value of the MRSEP and go to Step, else deduce that the optimal number of components is j- and stop. Table presents the MRSEP based on 0-fold cross validation computed for each algorithm and the results show that the mixture with components is selected. We can also observe that the SEM algorithm performs always better in fitting Poisson mixture regression model to the Fabric Faults data.. Patent The patent data given in Wang et. al ( consist of 0 observations on patent applications, research-and-development (R&D spending and sales in millions of dollar from pharmaceutical and biomedical companies in taken from the National Bureau of Economic Research R&D Master file. To model this data, Wang et. al ( used several covariates including logarithm of R&D spending and/or squared logarithm of R&D spending for different models. The data set can be loaded into R with the command data ( patent, package= flexmix. We fitted a Poisson mixture regression model using EM algorithm and SEM algorithm where the logarithm of R&D spending is used as independent variable. The algorithms were initiated by

15 Page of random numbers (second strategy, the stopping criterion was the same used in the simulation study and the optimal number of components was selected using the proposed procedure described in section.. Table presents the MRSEP based on 0- fold cross validation computed for each algorithm and the results show that the mixture with components is selected. We can also observe that the SEM algorithm performs always better in fitting Poisson mixture regression model to the patent data. Conclusion In this paper, we compare the performance of two algorithms to compute maximum likelihood estimates of a mixture Poisson regression models, the EM algorithm and the Stochastic EM algorithm (SEM. The results of simulation show that the choice of approach depends essentially on the overlap of the regression lines. For some severely overlapping mixtures, the EM algorithm outperforms the SEM algorithm by producing estimates of the parameters that have smaller estimation error. However, simulation results indicated that the Stochastic EM Algorithm provides in general best estimates for those parameters in the sense of the best fit for the regression model. In the real data case, we also show that the SEM algorithm resulted in model estimates that best fit the regression model. As we expected, the SEM algorithm and the EM algorithm can converge to the different estimates. EM convergence is very dependent upon the type of starting values and the stopping rule used, so the EM algorithm may converge to local maxima or saddle points. The SEM

16 Page of algorithm exhibits more reliable convergence because the stochastic step enables this algorithm to escape from saddle points in the likelihood. References Aitkin, M. (. A general maximum likelihood analysis of overdispersion in generalized linear models. Statistics and Computing :-. Celeux, G., Diebolt, J. (. The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Computational Statistics Quarterly :-. Celeux, G., Govaert, G. (. Comparison of the mixture and the classification maximum likelihood in cluster analysis. J. Statistical Computation and Simulation :-. Celeux, G., Chauveau, D., Diebolt, J. (. Stochastic versions of the EM algorithm: an experimental study in the mixture case. J. Statistical Computation and Simulation :-. Dempster, A.P., Laird, N.M., Rubin, D.B. (. Maximum likelihood from incomplete data via the EM algorithm. J. of the Royal Statistical Soc. B :-. Dias, J., Wedel, M. (00. An empirical comparison of EM, SEM and MCMC performance for problematic Gaussian mixture likelihoods. Statistics and Computing :-. Diebolt, J. and Ip, E.H.S. (. Stochastic EM: method and application. In W.R. Gilks, S. Richardson, D.J. Speigelhalter (eds, Markov Chain Monte Carlo in Practice. London: Chapman & Hall. Faria, S., Soromenho, G. (00. Fitting mixtures of linear regressions. J. Statistical Computation and Simulation 0(:0-. Fruhwirth-Schnatter, S. (00. Finite Mixture and Markov Switching Models, Springer, Heidelberg.

17 Page of Hastie, T., Tibshirani, R., Friedman, J.(00. Elements of Statistical Learning, Springer, New York. Leisch, F. (00. FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software (:-. McLachlan, G.J., Peel, D. (000. Finite Mixture Models, Wiley, New York. R Development Core Team (00. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL Teicher, H. (0. On the mixture of distributions. Annals of Mathematical Statistics :-. Wang, P., Puterman, M.L., Cockburn, I.M., Le, N. (. Mixed poisson regression models with covariate dependent rates. Biometrics (:-00. Wedel, M., Desarbo, W.S., Bult, J.R., Ramaswamy, V. (. A latent class poisson regression model for heterogeneous count data. Journal of Applied Econometrics (:-. Yang, M.S., Lai, C.Y. (00. Mixture poisson regression models for heterogeneous count data based on latent and fuzzy class analysis. Soft Computing :-.

18 0 β 0 Page of Table. True parameter values for the essays with a component mixtures of Poisson regression Cases β 0 β β 0 β π A - 0. A A A Table. True parameter values for the essays with a component mixtures of Poisson regression Cases 0 β β β 0 β β 0 β π π β B B B B Table. MRSEP by 0-fold cross-validation for component models when the algorithms were initiated by random numbers B B n= 00 n= 00 n= 000 n= 00 n= 00 n= 000 π π EM SEM EM SEM EM SEM EM SEM EM SEM EM SEM

19 β 0 Page of Table. MRSEP by 0-fold cross-validation for component models when the algorithms were initiated by random numbers B B n= 00 n= 00 n= 000 n= 00 n= 00 n= 000 π π EM SEM EM SEM EM SEM EM SEM EM SEM EM SEM Table. MRSEP based on 0-fold cross validation for fabric faults dataset EM Algorithm SEM Algorithm components.0.00 components.0.

20 β 0 Page of Table. MRSEP based on 0-fold cross validation for Patent dataset EM Algorithm SEM Algorithm components.0 0. components.0 0. components.0 0.0

21 Page 0 of Figure. Scatter plot of samples from component models with n = 00 (A, A and A

22 Page of A A A Figure. Distance between estimated and true parameter values for two-component Poisson mixture regression models. (EM. and SEM. - the algorithms are initiated with the true parameter values; EM. and SEM. the algorithms are initiated by random numbers A

23 Page of A Figure. MRSEP by 0-fold cross-validation for component models when the algorithms were initiated by random numbers. A π π

24 Page of _ Figure. MRSEP by 0-fold cross-validation for component models when the algorithms were initiated by random numbers. A π A π

25 Page of B B B Figure. Distance between estimated and true parameter values for three-component Poisson mixture regression models. (EM. and SEM. - the algorithms are initiated with the true parameter values; EM. and SEM. the algorithms are initiated by random numbers B

26 Page of Communication in Statistics Simulation and Computation LSSP 00-0 Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study Susana Faria and Gilda Soromenho Comments to the issues raised by the referees:. Page, lines from below: Text should read identifiability. That is, two sets of parameters do not yield the same We rewrite the text in page. The changes we make in the manuscript are in coloured text.. Page : First paragraph of Section : This material needs to be very carefully rewritten. Most of the statements made here are just not true. It is not clear what is intended. Does the SEM algorithm always converge? Does it converge if MLE s and/or EM estimates can not be obtained? Page. Line, sentence beginning Given a set of independent I don t believe that this statement is true at all. The exact conditions for existence of maximum likelihood estimates is a very difficult statement to make. Perhaps you can say that MLE s can sometimes be estimated from this likelihood function provided such estimates exist. We rewrite the text in Section. The changes we make in the manuscript are in coloured text.. Page, Equation (: Conditional on the lambda parameters, only the pi s are estimated. The real condition is y given x, but not lambda here. We rewrite the equation ( and also equation (, ( and (.. Page, equation (: This is confusing. I thought that the parameters we were estimating were the beta s, not the lambda s. We rewrite the equation ( and also equation ( and (.. Page, top paragraph: Please make a comment that this provides a bias towards uniform pi parameters. We make a comment. The changes we make in the manuscript are in coloured text.. Page : Refer to page number in books such as Hastie, 00. We refer the page number.. Page : End Section. by explaining which method is better. What do you conclude from this example? The two methods converge to different estimates. Which is do you prefer?

27 Page of We eliminate the end of Section. and. (and Figure and Figure. The results in Table and Table show that the SEM algorithm perform always better in fitting Poisson mixture regression model to the data.. References: These are in different styles. Some are all capitals in titles, others are not. Please refer to the style requirements for this journal. We rewrite some references. The changes we make in the manuscript are in coloured text.

An empirical comparison of EM, SEM and MCMC performance for problematic Gaussian mixture likelihoods

An empirical comparison of EM, SEM and MCMC performance for problematic Gaussian mixture likelihoods Statistics and Computing 14: 323 332, 2004 C 2004 Kluwer Academic Publishers. Manufactured in The Netherlands. An empirical comparison of EM, SEM and MCMC performance for problematic Gaussian mixture likelihoods

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Label Switching and Its Simple Solutions for Frequentist Mixture Models

Label Switching and Its Simple Solutions for Frequentist Mixture Models Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching

More information

Regression Clustering

Regression Clustering Regression Clustering In regression clustering, we assume a model of the form y = f g (x, θ g ) + ɛ g for observations y and x in the g th group. Usually, of course, we assume linear models of the form

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs Model-based cluster analysis: a Defence Gilles Celeux Inria Futurs Model-based cluster analysis Model-based clustering (MBC) consists of assuming that the data come from a source with several subpopulations.

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

More on Unsupervised Learning

More on Unsupervised Learning More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Determining the number of components in mixture models for hierarchical data

Determining the number of components in mixture models for hierarchical data Determining the number of components in mixture models for hierarchical data Olga Lukočienė 1 and Jeroen K. Vermunt 2 1 Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for

More information

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance

More information

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models Jaemo Sung 1, Sung-Yang Bang 1, Seungjin Choi 1, and Zoubin Ghahramani 2 1 Department of Computer Science, POSTECH,

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

PACKAGE LMest FOR LATENT MARKOV ANALYSIS

PACKAGE LMest FOR LATENT MARKOV ANALYSIS PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,

More information

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares

More information

A Simple Solution to Bayesian Mixture Labeling

A Simple Solution to Bayesian Mixture Labeling Communications in Statistics - Simulation and Computation ISSN: 0361-0918 (Print) 1532-4141 (Online) Journal homepage: http://www.tandfonline.com/loi/lssp20 A Simple Solution to Bayesian Mixture Labeling

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn CMU 10-701: Machine Learning (Fall 2016) https://piazza.com/class/is95mzbrvpn63d OUT: September 13th DUE: September

More information

An Introduction to mixture models

An Introduction to mixture models An Introduction to mixture models by Franck Picard Research Report No. 7 March 2007 Statistics for Systems Biology Group Jouy-en-Josas/Paris/Evry, France http://genome.jouy.inra.fr/ssb/ An introduction

More information

Hmms with variable dimension structures and extensions

Hmms with variable dimension structures and extensions Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating

More information

Different points of view for selecting a latent structure model

Different points of view for selecting a latent structure model Different points of view for selecting a latent structure model Gilles Celeux Inria Saclay-Île-de-France, Université Paris-Sud Latent structure models: two different point of views Density estimation LSM

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

The EM Algorithm for the Finite Mixture of Exponential Distribution Models

The EM Algorithm for the Finite Mixture of Exponential Distribution Models Int. J. Contemp. Math. Sciences, Vol. 9, 2014, no. 2, 57-64 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijcms.2014.312133 The EM Algorithm for the Finite Mixture of Exponential Distribution

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

The Expectation Maximization Algorithm

The Expectation Maximization Algorithm The Expectation Maximization Algorithm Frank Dellaert College of Computing, Georgia Institute of Technology Technical Report number GIT-GVU-- February Abstract This note represents my attempt at explaining

More information

A Note on the Expectation-Maximization (EM) Algorithm

A Note on the Expectation-Maximization (EM) Algorithm A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization

More information

Mixtures of Rasch Models

Mixtures of Rasch Models Mixtures of Rasch Models Hannah Frick, Friedrich Leisch, Achim Zeileis, Carolin Strobl http://www.uibk.ac.at/statistics/ Introduction Rasch model for measuring latent traits Model assumption: Item parameters

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Estimating the parameters of hidden binomial trials by the EM algorithm

Estimating the parameters of hidden binomial trials by the EM algorithm Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :

More information

Programming Assignment 4: Image Completion using Mixture of Bernoullis

Programming Assignment 4: Image Completion using Mixture of Bernoullis Programming Assignment 4: Image Completion using Mixture of Bernoullis Deadline: Tuesday, April 4, at 11:59pm TA: Renie Liao (csc321ta@cs.toronto.edu) Submission: You must submit two files through MarkUs

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

MODEL BASED CLUSTERING FOR COUNT DATA

MODEL BASED CLUSTERING FOR COUNT DATA MODEL BASED CLUSTERING FOR COUNT DATA Dimitris Karlis Department of Statistics Athens University of Economics and Business, Athens April OUTLINE Clustering methods Model based clustering!"the general model!"algorithmic

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous

More information

Data Preprocessing. Cluster Similarity

Data Preprocessing. Cluster Similarity 1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M

More information

Lecture 6: Gaussian Mixture Models (GMM)

Lecture 6: Gaussian Mixture Models (GMM) Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

A CUSUM approach for online change-point detection on curve sequences

A CUSUM approach for online change-point detection on curve sequences ESANN 22 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges Belgium, 25-27 April 22, i6doc.com publ., ISBN 978-2-8749-49-. Available

More information

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Collaboration with Rudolf Winter-Ebmer, Department of Economics, Johannes Kepler University

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see

More information

Multilevel Mixture with Known Mixing Proportions: Applications to School and Individual Level Overweight and Obesity Data from Birmingham, England

Multilevel Mixture with Known Mixing Proportions: Applications to School and Individual Level Overweight and Obesity Data from Birmingham, England 1 Multilevel Mixture with Known Mixing Proportions: Applications to School and Individual Level Overweight and Obesity Data from Birmingham, England By Shakir Hussain 1 and Ghazi Shukur 1 School of health

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

PERFORMANCE OF THE EM ALGORITHM ON THE IDENTIFICATION OF A MIXTURE OF WAT- SON DISTRIBUTIONS DEFINED ON THE HYPER- SPHERE

PERFORMANCE OF THE EM ALGORITHM ON THE IDENTIFICATION OF A MIXTURE OF WAT- SON DISTRIBUTIONS DEFINED ON THE HYPER- SPHERE REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 111 130 PERFORMANCE OF THE EM ALGORITHM ON THE IDENTIFICATION OF A MIXTURE OF WAT- SON DISTRIBUTIONS DEFINED ON THE HYPER- SPHERE Authors: Adelaide

More information

Expectation maximization

Expectation maximization Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is

More information

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a

More information

Bayesian Mixture Labeling by Minimizing Deviance of. Classification Probabilities to Reference Labels

Bayesian Mixture Labeling by Minimizing Deviance of. Classification Probabilities to Reference Labels Bayesian Mixture Labeling by Minimizing Deviance of Classification Probabilities to Reference Labels Weixin Yao and Longhai Li Abstract Solving label switching is crucial for interpreting the results of

More information

Finite Mixture and Markov Switching Models

Finite Mixture and Markov Switching Models Sylvia Frühwirth-Schnatter Finite Mixture and Markov Switching Models Implementation in MATLAB using the package bayesf Version 2.0 December 2, 2008 Springer Berlin Heidelberg NewYork Hong Kong London

More information

Hidden Markov models for time series of counts with excess zeros

Hidden Markov models for time series of counts with excess zeros Hidden Markov models for time series of counts with excess zeros Madalina Olteanu and James Ridgway University Paris 1 Pantheon-Sorbonne - SAMM, EA4543 90 Rue de Tolbiac, 75013 Paris - France Abstract.

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

K-Means, Expectation Maximization and Segmentation. D.A. Forsyth, CS543

K-Means, Expectation Maximization and Segmentation. D.A. Forsyth, CS543 K-Means, Expectation Maximization and Segmentation D.A. Forsyth, CS543 K-Means Choose a fixed number of clusters Choose cluster centers and point-cluster allocations to minimize error can t do this by

More information

Optimization Methods II. EM algorithms.

Optimization Methods II. EM algorithms. Aula 7. Optimization Methods II. 0 Optimization Methods II. EM algorithms. Anatoli Iambartsev IME-USP Aula 7. Optimization Methods II. 1 [RC] Missing-data models. Demarginalization. The term EM algorithms

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

On Mixture Regression Shrinkage and Selection via the MR-LASSO

On Mixture Regression Shrinkage and Selection via the MR-LASSO On Mixture Regression Shrinage and Selection via the MR-LASSO Ronghua Luo, Hansheng Wang, and Chih-Ling Tsai Guanghua School of Management, Peing University & Graduate School of Management, University

More information

Machine Learning for Signal Processing Expectation Maximization Mixture Models. Bhiksha Raj 27 Oct /

Machine Learning for Signal Processing Expectation Maximization Mixture Models. Bhiksha Raj 27 Oct / Machine Learning for Signal rocessing Expectation Maximization Mixture Models Bhiksha Raj 27 Oct 2016 11755/18797 1 Learning Distributions for Data roblem: Given a collection of examples from some data,

More information

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/9-92/2/ce725-/ Agenda Expectation-maximization

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

ANALYSIS OF PANEL DATA MODELS WITH GROUPED OBSERVATIONS. 1. Introduction

ANALYSIS OF PANEL DATA MODELS WITH GROUPED OBSERVATIONS. 1. Introduction Tatra Mt Math Publ 39 (2008), 183 191 t m Mathematical Publications ANALYSIS OF PANEL DATA MODELS WITH GROUPED OBSERVATIONS Carlos Rivero Teófilo Valdés ABSTRACT We present an iterative estimation procedure

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Evaluation Methods for Topic Models

Evaluation Methods for Topic Models University of Massachusetts Amherst wallach@cs.umass.edu April 13, 2009 Joint work with Iain Murray, Ruslan Salakhutdinov and David Mimno Statistical Topic Models Useful for analyzing large, unstructured

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

Expectation Maximization Algorithm

Expectation Maximization Algorithm Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning More Approximate Inference Mark Schmidt University of British Columbia Winter 2018 Last Time: Approximate Inference We ve been discussing graphical models for density estimation,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Inferring biological dynamics Iterated filtering (IF)

Inferring biological dynamics Iterated filtering (IF) Inferring biological dynamics 101 3. Iterated filtering (IF) IF originated in 2006 [6]. For plug-and-play likelihood-based inference on POMP models, there are not many alternatives. Directly estimating

More information

Lecture 14. Clustering, K-means, and EM

Lecture 14. Clustering, K-means, and EM Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.

More information

Homework 6: Image Completion using Mixture of Bernoullis

Homework 6: Image Completion using Mixture of Bernoullis Homework 6: Image Completion using Mixture of Bernoullis Deadline: Wednesday, Nov. 21, at 11:59pm Submission: You must submit two files through MarkUs 1 : 1. a PDF file containing your writeup, titled

More information

B.1 Optimization of the negative log-likelihood functions

B.1 Optimization of the negative log-likelihood functions Ecological Archives A/E/M000-000-A1 Jeff A. Tracey, Jun Zhu, Erin Boydston, Lisa Lyren, Robert N. Fisher, and Kevin R. Crooks. 2012. Mapping behavioral landscapes for animal movement: A finite mixture

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Review of Maximum Likelihood Estimators

Review of Maximum Likelihood Estimators Libby MacKinnon CSE 527 notes Lecture 7, October 7, 2007 MLE and EM Review of Maximum Likelihood Estimators MLE is one of many approaches to parameter estimation. The likelihood of independent observations

More information

Expectation maximization tutorial

Expectation maximization tutorial Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

10-701/15-781, Machine Learning: Homework 4

10-701/15-781, Machine Learning: Homework 4 10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one

More information

Uncertainty quantification and visualization for functional random variables

Uncertainty quantification and visualization for functional random variables Uncertainty quantification and visualization for functional random variables MascotNum Workshop 2014 S. Nanty 1,3 C. Helbert 2 A. Marrel 1 N. Pérot 1 C. Prieur 3 1 CEA, DEN/DER/SESI/LSMR, F-13108, Saint-Paul-lez-Durance,

More information