Communications in Statistics - Simulation and Computation. Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study

Size: px

Start display at page:

Download "Communications in Statistics - Simulation and Computation. Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study"

Cleopatra Bailey
6 years ago
Views:

1 Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study Journal: Manuscript ID: LSSP-00-0.R Manuscript Type: Original Paper Date Submitted by the Author: -May-0 Complete List of Authors: Faria, Susana; University of Minho, Department of Mathematics and Applications Soromenho, Gilda; University of Lisbon Keywords: Maximum likelihood estimation, EM algorithm, Stochastic EM algorithm, Mixture Poisson regression models, Simulation study Abstract: In this work, we propose to compare two algorithms to compute maximum likelihood estimates for the parameters of a mixture Poisson regression model: the EM algorithm and the Stochastic EM algorithm. The comparison of the two procedures was done through a simulation study of the performance of these approaches on simulated data sets and real data sets. Simulation results show that the choice of the approach depends essentially on the overlap of the regression lines. In the real data case, we show that the Stochastic EM algorithm resulted in model estimates that best fit the regression model. Note: The following files were submitted by the author for peer review, but cannot be converted to PDF. You must view these files (e.g. movies online. sfariasoromenho.zip

2 Page of

3 Page of Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study Susana Faria,Gilda Soromenho Department of Mathematics and Applications, University of Minho, 00-0 Guimarães, Portugal, sfaria@math.uminho.pt Institute of Education, University of Lisbon, Portugal, gspereira@ie.ul.pt Abstract: In this work, we propose to compare two algorithms to compute maximum likelihood estimates for the parameters of a mixture Poisson regression model: the EM algorithm and the Stochastic EM algorithm. The comparison of the two procedures was done through a simulation study of the performance of these approaches on simulated data sets and real data sets. Simulation results show that the choice of the approach depends essentially on the overlap of the regression lines. In the real data case, we show that the Stochastic EM algorithm resulted in model estimates that best fit the regression model. Keywords: Maximum likelihood estimation, EM algorithm, Stochastic EM algorithm, Mixture Poisson regression models, Simulation study Introduction Finite mixture models are a well-known method for modelling data that arise from a heterogeneous population (see e.g. McLachlan et al., 000 and Fruhwirth-Schnatter, 00 for a review. The

4 Page of study of these models is a well-established and active area of statistical research and mixtures of regressions have also been studied fairly extensively. In particular, Poisson mixture regression models are commonly used to analyze heterogeneous count data. Wedel et al. ( proposed a class Poisson regression model and an EM algorithm for estimation was described. Wang et al. ( studied mixed Poisson regression models and maximum likelihood estimates of the parameters were obtained by combining EM and quasi-newton algorithms. In this work, we study the procedure for fitting Poisson mixture regression models by means of maximum likelihood (ML. We apply two maximization algorithms to obtain the maximum likelihood estimates: the Expectation Maximization (EM algorithm (Dempster et al., and the Stochastic Expectation Maximization (SEM algorithm (Celeux and Diebolt,. The comparison of EM and SEM approaches in a mixture of distributions is well known. Celeux et al. ( have investigated the practical behaviour of these algorithms through intensive Monte Carlo numerical simulations and a real data study. Dias and Wedel (00 have compared EM and SEM algorithms to estimate the parameters of Gaussian mixture model. Faria and Soromenho (00 have performed a simulation study to compare the performance of these two approaches on Gaussian mixtures of linear regressions. This paper is organized as follows: Section describes the model. Parameter estimation based on the EM algorithm and the Stochastic EM algorithm is discussed in Section. Section provides a simulation study investigating the performance of these algorithms for fitting two and three component mixtures of Poisson regression models. We also study the performance of algorithms in real data sets in section. In Section the conclusions of our study are drawn.

5 Page of Poisson mixture regression models Let the random variable Y i denote the ith response variable, and let observations where y i is the observed value of It is assumed that the marginal distribution of where T and λ ij = exp( β j xi, with f j ( y x Y i and i ( y i, x i, i =, K, n denote x is a (p+-dimensional covariate vector. Y i follows a mixture of Poisson distributions, h( y = J i xi, θ π j f j ( yi xi ( j= ( λ y exp( λij i ij i i =, i =,..., n, j =,..., J yi! T β j = ( β j0, β j,..., β jp denoting the (p+-dimensional vector of regression coefficients for jth component and θ ( π π, β,..., ( =,..., J β J denotes the vector of all parameters. The proportionsπ are the mixing probabilities (0< π <, j =,..., J and π j j J j= j = and can be interpreted as the unconditional probabilities that an individual belongs to component j of the mixture. To be able to reliably estimate the parameters of mixture models we require identifiability. That is, two sets of parameters do not yield the same mixture distribution. Finite mixtures of Poisson distributions are identifiable (see Teicher, 0 for details. Fruhwirth-Schnatter (00 shows that if the covariate matrix is of full rank and the mixing proportions, π, are all different, then the Poisson mixture regression model is identifiable. j

6 Page of Parameter Estimation Among the various estimation methods considered in the literature for finite mixture models, the maximum likelihood (ML has dominated the field. For a given number of J components, the task is to estimate the vector of parameters ( π π, β θ,...,,..., = J β J that maximizes the log-likelihood n J L( θ x,..., xn, y,..., yn = log ( π j f j yi xi i= ( j= The standard tool for finding maximum likelihood solution is the Expectation Maximization (EM algorithm. However, it suffers from slow convergence and may converge to local maxima or saddle points. The Stochastic Expectation Maximization (SEM algorithm is a viable alternative to find the ML estimates of the parameters of a mixture model. The SEM algorithm by using random drawing at each iteration, prevents from being trapped in local optima. It has some advantages over the EM algorithm: it does not get stuck; it often provides more information about the data (see Diebolt and Ip,, for instance when parameters cannot be estimated; and in certain conditions behaves better than EM algorithm (see Celeux et al.,.. The EM algorithm The EM algorithm is a broadly applicable approach to the iterative computation of maximumlikelihood estimates when the observations can be viewed as incomplete data. The idea here is to think of the data as consisting of triples ( yi, x i, zi, i =, K, n where z i T = ( z, K, z, i =, K, n, j =, K J is the unobserved indicator that specifies the mixture compo- i ij

7 Page of nent from which the observation ( y i, x i is drawn, i.e., z ij equals if observation i comes from component j and 0 otherwise. The log likelihood for the complete data is L ( = n J n J x,..., x, y,..., y z log( π + z log( f ( y x θ n n ij j ij j i i ( i= j= i= j= The EM algorithm is easy to program and proceeds iteratively in two steps, E (for expectation and M (for maximization. At the E-step, it replaces the missing data by its expectation conditional on the observed data. At the M-step, it finds the parameter estimates which maximize the expected log likelihood for the complete data, conditional on the expected values of the missing data. zij This procedure can be stated as follows. ( r E-step: Given the current parameter estimates θ in the rth iteration, replace the missing data by the estimated probabilities that the i observation belongs to the jth component of the mixture, estimates ( r w ij ( i i ij r j ( yi xi, β ( r ( r π f y x, β j = J j j= j ( r π f ( M-step: Given the estimates for the probabilities ( r+ θ of the parameters by maximizing Q ( r+ ( ( r θ θ = Q + Q ( under the restriction for the component weights and where ( + = ( n J r Q w ij log π ( i= j= j ij ( ( r w (which are functions of ( r θ, obtain new ij

8 Page of and n J ( r+ Q = wij log f j yi xi, β j ( i= j= J The maximization of Q under the restriction for the component weights, π j =, is equivalent j= to maximizing the function Q J ( r+ ( r+ ( π μ π n J * = w ij log j j i= j= j= where μ is the Lagrangian multiplier. Setting the derivative of function equal to zero yields and Q is maximized separately for each linear models (GLM.. The Stochastic EM algorithm n ( r wij ( r+ ˆ i= π j =, j =, K, J ( n Q with respect to * ( r+ π j =, K, J using weighted ML estimation of generalized We also apply a procedure for fitting Poisson mixture regression models using a stochastic version of the EM algorithm, the so-called SEM algorithm. The SEM algorithm is an improvement of the EM algorithm that incorporates a stochastic step (S-step between the E- and M-steps of EM. Starting from an initial parameter θ 0, an iteration of SEM algorithm consists of three steps. j

9 Page of mixture,, i =, K, n, j =, K J, are computed for the current value of θ as done in the standard EM. E-step: The estimated probabilities that the i observation belongs to the jth component of the ( w r ij, S-step: A partition ( r+ ( ( r+,, ( r+ P = P L PJ of ( y, x, K, ( yn, x n is designed by assigning each observation at random to one of the mixture components according to the multinomial distribution with parameter ( w r ij,, i =, K, n, j =, K J, given by (. If one of the P(r+ is empty or has only one observation, it must be considered that the mixture has J components instead of J and the estimation process begins with J components. Yet, in this case, it provides a bias towards uniform π j parameters. M-step: The ML estimate of θ is updated using the sub-samples ( r+ P. It follows that on the ( r+ M-step of the (r+th iteration, the parameter estimates π j are given by where n j ( r+ ˆ π j =, j =, K, J (0 n n j is the total number of observations arising from component j and the maximization of where { = } * J ( r+ Q = log f j yi xi, β j ( j= { i z = } ij ( r+ i z ij is the set of observations arising from the jth mixture component, gives β j. J

10 Page of Simulation study of algorithm performance. Design of the study To investigate the statistical behaviour of the proposed methods in fitting Poisson mixture regression models, a simulation study was performed. The simulation is designed to evaluate the model performance considering the effects of sample sizes and the initialization of the algorithms as well as the configuration of the regression lines. The scope was limited to the study of two and three components. We used the freeware R to develop the simulation program. Initial Conditions Two different approaches for choosing initial values are compared in the study. In the first strategy, we use the true parameter values of the model by generating the observations as initial values in order to determine the performance of the algorithm in the best case. In the other strategy we ran the algorithm 0 times from random initial position and we selected the solution out of 0 runs which provided the best value of the optimized criterion (Celeux et al.,. Stopping Rules For the EM algorithm, iterations were stopped when the relative change in log-likelihood between two successive iterations were less than 0 0. However, since SEM does not converge pointwise and it generates a Markov chain whose stationary distribution is more or less concentrated around the ML parameter estimate, we used as stopping rule for the SEM algorithm the total number of iterations required for convergence by the EM algorithm. Number of Samples Data set For each type of simulated data set, we generated 00 samples of size n.

11 Page 0 of Each datum ( y i, x i was generated by the following scheme. First, a uniform [0,] random number c i was generated and its value is used to select a particular component j from mixture of regressions model. Next, i x was randomly generated from a uniform [ ] we have λ = exp( β + β x. Finally, we simulate the value y P( λ ij j0 j Measure of Algorithm Performance i x ; distribution and then L x U i ~ ij. In order to examine the performance of two algorithms, we report the Euclidean distance between estimated parameters and true parameter values. Quality of the fit In order to compare the quality of the fit of two algorithms, we report the root mean squared error of prediction (MRSEP: where 00 ( m MRSEP = RMSEP 00 m= ( m RMSEP is the root mean squared error of prediction of the mth replication based on K-fold cross validation, which is given by: where ˆ = J i ˆ π ˆ jλij j= μ and V ( ˆ μ i ( m RMSEP = J J J = ˆ π ˆ + ˆ ˆ j λij π jλ ij j= j= j= n = ( yi ˆ μi n i V ˆ μi ˆ ˆ π jλij For the K-fold cross validation, we have chosen K = and K = 0 ( Hastie el al., 00, p..

12 Page of Simulation results: two component mixture of Poisson regressions For two-component models, samples of four different sizes n ( n = 0,00, 00,000 were generated for each set of true parameter values ( π, β shown on Table (Yang and Lai, 00 and Leisch, 00. For instance, we present in Figure typical scatter plots for samples with size 00. Note that the cases considered correspond to varying degrees of overlapping. Case A has the highest overlapping and data from A show the lowest overlapping. Figure shows boxplots of the Euclidean distance between estimated and true parameters over the 00 replications using the EM and SEM algorithm for fitting two component mixtures of Poisson regression models. Figure shows that the three algorithms have practically the same behaviour. However, when the overlap is high (A EM outperforms SEM by producing estimates of the parameters that have smaller estimation error. As expected, estimation error decreases when the sample size increases. The resulting values of MRSEP based on 0-fold cross validation, for each of the configurations of the true regression lines are plotted in Figure and Figure. Similar results were obtained calculating MRSEP based on -fold cross validation. Figure and Figure show that, in generality, the SEM algorithm performs better than the EM algorithm.. Simulation results: three component mixture of Poisson regressions For three-component models, samples of three different sizes n ( n =00, 00,000 were generated for each set of true parameter values ( π, β shown on Table. Also, the cases considered correspond to varying degrees of overlapping. Cases B, B and B have the highest overlapping and data from B show the lowest overlapping. 0

13 Page of Figure shows boxplots of the Euclidean distance between estimated and true parameters over the 00 replications using the EM and SEM algorithm for fitting three component mixtures of Poisson regression models. Figure shows that EM outperforms SEM by producing estimates of the parameters that have lower estimation error, especially when the overlap is higher (B. Also, as expected, estimation error tends to decrease as the sample size increases. The resulting values of MRSEP based on 0-fold cross validation, for each of the configurations of the true regression lines are shown in Table and Table. Similar results were obtained calculating MRSEP based on -fold cross validation. Table and Table show that, in generality, the SEM algorithm performs better than the EM algorithm. Real Data Sets We now compare the performance of the EM algorithm and the SEM algorithm for fitting Poisson mixture regression models in two real data sets.. Fabric faults The Fabric Faults data set consists of observations of number of faults in rolls of fabric of different length. The dataset is analysed using a finite mixture of Poisson regression models in Aitkin (. The response variable is the number of faults and the covariate is the length of role in meters. The data set can be loaded into R with the command data ( fabricfault, package= flexmix. We fitted a component Poisson mixture regression model using EM algorithm and SEM algorithm where the logarithm of lengths is used as independent variable. The algorithms were initiated by random numbers (second strategy and the stopping criterion was the same used in the simulation

14 Page of study. For each algorithm, the optimal number of components was selected using the proposed procedure: Step : Set j= and calculate the value of the MRSEP based on k-fold cross validation for a two component model. Let this value be denoted by MIN. Step : Set j=j+ and calculate the value of the MRSEP based on k-fold cross validation for a j component model. Step : If the new value of the MRSEP is lower than MIN then set MIN equal to the new value of the MRSEP and go to Step, else deduce that the optimal number of components is j- and stop. Table presents the MRSEP based on 0-fold cross validation computed for each algorithm and the results show that the mixture with components is selected. We can also observe that the SEM algorithm performs always better in fitting Poisson mixture regression model to the Fabric Faults data.. Patent The patent data given in Wang et. al ( consist of 0 observations on patent applications, research-and-development (R&D spending and sales in millions of dollar from pharmaceutical and biomedical companies in taken from the National Bureau of Economic Research R&D Master file. To model this data, Wang et. al ( used several covariates including logarithm of R&D spending and/or squared logarithm of R&D spending for different models. The data set can be loaded into R with the command data ( patent, package= flexmix. We fitted a Poisson mixture regression model using EM algorithm and SEM algorithm where the logarithm of R&D spending is used as independent variable. The algorithms were initiated by

15 Page of random numbers (second strategy, the stopping criterion was the same used in the simulation study and the optimal number of components was selected using the proposed procedure described in section.. Table presents the MRSEP based on 0- fold cross validation computed for each algorithm and the results show that the mixture with components is selected. We can also observe that the SEM algorithm performs always better in fitting Poisson mixture regression model to the patent data. Conclusion In this paper, we compare the performance of two algorithms to compute maximum likelihood estimates of a mixture Poisson regression models, the EM algorithm and the Stochastic EM algorithm (SEM. The results of simulation show that the choice of approach depends essentially on the overlap of the regression lines. For some severely overlapping mixtures, the EM algorithm outperforms the SEM algorithm by producing estimates of the parameters that have smaller estimation error. However, simulation results indicated that the Stochastic EM Algorithm provides in general best estimates for those parameters in the sense of the best fit for the regression model. In the real data case, we also show that the SEM algorithm resulted in model estimates that best fit the regression model. As we expected, the SEM algorithm and the EM algorithm can converge to the different estimates. EM convergence is very dependent upon the type of starting values and the stopping rule used, so the EM algorithm may converge to local maxima or saddle points. The SEM

16 Page of algorithm exhibits more reliable convergence because the stochastic step enables this algorithm to escape from saddle points in the likelihood. References Aitkin, M. (. A general maximum likelihood analysis of overdispersion in generalized linear models. Statistics and Computing :-. Celeux, G., Diebolt, J. (. The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Computational Statistics Quarterly :-. Celeux, G., Govaert, G. (. Comparison of the mixture and the classification maximum likelihood in cluster analysis. J. Statistical Computation and Simulation :-. Celeux, G., Chauveau, D., Diebolt, J. (. Stochastic versions of the EM algorithm: an experimental study in the mixture case. J. Statistical Computation and Simulation :-. Dempster, A.P., Laird, N.M., Rubin, D.B. (. Maximum likelihood from incomplete data via the EM algorithm. J. of the Royal Statistical Soc. B :-. Dias, J., Wedel, M. (00. An empirical comparison of EM, SEM and MCMC performance for problematic Gaussian mixture likelihoods. Statistics and Computing :-. Diebolt, J. and Ip, E.H.S. (. Stochastic EM: method and application. In W.R. Gilks, S. Richardson, D.J. Speigelhalter (eds, Markov Chain Monte Carlo in Practice. London: Chapman & Hall. Faria, S., Soromenho, G. (00. Fitting mixtures of linear regressions. J. Statistical Computation and Simulation 0(:0-. Fruhwirth-Schnatter, S. (00. Finite Mixture and Markov Switching Models, Springer, Heidelberg.

17 Page of Hastie, T., Tibshirani, R., Friedman, J.(00. Elements of Statistical Learning, Springer, New York. Leisch, F. (00. FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software (:-. McLachlan, G.J., Peel, D. (000. Finite Mixture Models, Wiley, New York. R Development Core Team (00. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL Teicher, H. (0. On the mixture of distributions. Annals of Mathematical Statistics :-. Wang, P., Puterman, M.L., Cockburn, I.M., Le, N. (. Mixed poisson regression models with covariate dependent rates. Biometrics (:-00. Wedel, M., Desarbo, W.S., Bult, J.R., Ramaswamy, V. (. A latent class poisson regression model for heterogeneous count data. Journal of Applied Econometrics (:-. Yang, M.S., Lai, C.Y. (00. Mixture poisson regression models for heterogeneous count data based on latent and fuzzy class analysis. Soft Computing :-.

18 0 β 0 Page of Table. True parameter values for the essays with a component mixtures of Poisson regression Cases β 0 β β 0 β π A - 0. A A A Table. True parameter values for the essays with a component mixtures of Poisson regression Cases 0 β β β 0 β β 0 β π π β B B B B Table. MRSEP by 0-fold cross-validation for component models when the algorithms were initiated by random numbers B B n= 00 n= 00 n= 000 n= 00 n= 00 n= 000 π π EM SEM EM SEM EM SEM EM SEM EM SEM EM SEM

19 β 0 Page of Table. MRSEP by 0-fold cross-validation for component models when the algorithms were initiated by random numbers B B n= 00 n= 00 n= 000 n= 00 n= 00 n= 000 π π EM SEM EM SEM EM SEM EM SEM EM SEM EM SEM Table. MRSEP based on 0-fold cross validation for fabric faults dataset EM Algorithm SEM Algorithm components.0.00 components.0.

20 β 0 Page of Table. MRSEP based on 0-fold cross validation for Patent dataset EM Algorithm SEM Algorithm components.0 0. components.0 0. components.0 0.0

21 Page 0 of Figure. Scatter plot of samples from component models with n = 00 (A, A and A

22 Page of A A A Figure. Distance between estimated and true parameter values for two-component Poisson mixture regression models. (EM. and SEM. - the algorithms are initiated with the true parameter values; EM. and SEM. the algorithms are initiated by random numbers A

23 Page of A Figure. MRSEP by 0-fold cross-validation for component models when the algorithms were initiated by random numbers. A π π

24 Page of _ Figure. MRSEP by 0-fold cross-validation for component models when the algorithms were initiated by random numbers. A π A π

25 Page of B B B Figure. Distance between estimated and true parameter values for three-component Poisson mixture regression models. (EM. and SEM. - the algorithms are initiated with the true parameter values; EM. and SEM. the algorithms are initiated by random numbers B

26 Page of Communication in Statistics Simulation and Computation LSSP 00-0 Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study Susana Faria and Gilda Soromenho Comments to the issues raised by the referees:. Page, lines from below: Text should read identifiability. That is, two sets of parameters do not yield the same We rewrite the text in page. The changes we make in the manuscript are in coloured text.. Page : First paragraph of Section : This material needs to be very carefully rewritten. Most of the statements made here are just not true. It is not clear what is intended. Does the SEM algorithm always converge? Does it converge if MLE s and/or EM estimates can not be obtained? Page. Line, sentence beginning Given a set of independent I don t believe that this statement is true at all. The exact conditions for existence of maximum likelihood estimates is a very difficult statement to make. Perhaps you can say that MLE s can sometimes be estimated from this likelihood function provided such estimates exist. We rewrite the text in Section. The changes we make in the manuscript are in coloured text.. Page, Equation (: Conditional on the lambda parameters, only the pi s are estimated. The real condition is y given x, but not lambda here. We rewrite the equation ( and also equation (, ( and (.. Page, equation (: This is confusing. I thought that the parameters we were estimating were the beta s, not the lambda s. We rewrite the equation ( and also equation ( and (.. Page, top paragraph: Please make a comment that this provides a bias towards uniform pi parameters. We make a comment. The changes we make in the manuscript are in coloured text.. Page : Refer to page number in books such as Hastie, 00. We refer the page number.. Page : End Section. by explaining which method is better. What do you conclude from this example? The two methods converge to different estimates. Which is do you prefer?

27 Page of We eliminate the end of Section. and. (and Figure and Figure. The results in Table and Table show that the SEM algorithm perform always better in fitting Poisson mixture regression model to the data.. References: These are in different styles. Some are all capitals in titles, others are not. Please refer to the style requirements for this journal. We rewrite some references. The changes we make in the manuscript are in coloured text.

An empirical comparison of EM, SEM and MCMC performance for problematic Gaussian mixture likelihoods

Statistics and Computing 14: 323 332, 2004 C 2004 Kluwer Academic Publishers. Manufactured in The Netherlands. An empirical comparison of EM, SEM and MCMC performance for problematic Gaussian mixture likelihoods