The EM Algorithm for the Finite Mixture of Exponential Distribution Models

Similar documents
NONPARAMETRIC MIXTURE OF REGRESSION MODELS

Local Modal Regression

Nonparametric Modal Regression

On the Three-Phase-Lag Heat Equation with Spatial Dependent Lags

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Symmetric Properties for the (h, q)-tangent Polynomials

The Generalized Viscosity Implicit Rules of Asymptotically Nonexpansive Mappings in Hilbert Spaces

Theory of Statistical Tests

Label Switching and Its Simple Solutions for Frequentist Mixture Models

Improvements in Newton-Rapshon Method for Nonlinear Equations Using Modified Adomian Decomposition Method

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Basins of Attraction for Optimal Third Order Methods for Multiple Roots

Estimation of the Bivariate Generalized. Lomax Distribution Parameters. Based on Censored Samples

Statistics & Data Sciences: First Year Prelim Exam May 2018

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Solving Homogeneous Systems with Sub-matrices

The Split Hierarchical Monotone Variational Inclusions Problems and Fixed Point Problems for Nonexpansive Semigroup

Computation of an efficient and robust estimator in a semiparametric mixture model

Double Total Domination in Circulant Graphs 1

Double Gamma Principal Components Analysis

Dynamical Behavior for Optimal Cubic-Order Multiple Solver

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models

Double Total Domination on Generalized Petersen Graphs 1

Note on the Expected Value of a Function of a Fuzzy Variable

A Family of Optimal Multipoint Root-Finding Methods Based on the Interpolating Polynomials

An Abundancy Result for the Two Prime Power Case and Results for an Equations of Goormaghtigh

Research Article Optimal Portfolio Estimation for Dependent Financial Returns with Generalized Empirical Likelihood

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

The Rainbow Connection of Windmill and Corona Graph

Parameter Estimation for ARCH(1) Models Based on Kalman Filter

Some Properties of a Semi Dynamical System. Generated by von Forester-Losata Type. Partial Equations

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Density Estimation. Seungjin Choi

One-Step Hybrid Block Method with One Generalized Off-Step Points for Direct Solution of Second Order Ordinary Differential Equations

ON SOME TWO-STEP DENSITY ESTIMATION METHOD

On the First Hitting Times to Boundary of the Reflected O-U Process with Two-Sided Barriers

Local Polynomial Regression

Time Series and Forecasting Lecture 4 NonLinear Time Series

Remark on a Couple Coincidence Point in Cone Normed Spaces

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA

On Optimality Conditions for Pseudoconvex Programming in Terms of Dini Subdifferentials

Spatial aggregation of local likelihood estimates with applications to classification

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Odds ratio estimation in Bernoulli smoothing spline analysis-ofvariance

Remarks on the Maximum Principle for Parabolic-Type PDEs

New mixture models and algorithms in the mixtools package

Estimation for nonparametric mixture models

Mixture of Regression Models with Varying Mixing Proportions: A Semiparametric Approach

A Signed-Rank Test Based on the Score Function

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

G8325: Variational Bayes

Weighted Composition Followed by Differentiation between Weighted Bergman Space and H on the Unit Ball 1

DA Freedman Notes on the MLE Fall 2003

1 Bayesian Linear Regression (BLR)

Biostat 2065 Analysis of Incomplete Data

F & B Approaches to a simple model

Order-theoretical Characterizations of Countably Approximating Posets 1

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Sharp Bounds for Seiffert Mean in Terms of Arithmetic and Geometric Means 1

Optimization Methods II. EM algorithms.

Kernel Logistic Regression and the Import Vector Machine

Empirical Comparison of ML and UMVU Estimators of the Generalized Variance for some Normal Stable Tweedie Models: a Simulation Study

A Class of Z4C-Groups

Construction of Combined Charts Based on Combining Functions

Parameter Estimation

Positive Solution of a Nonlinear Four-Point Boundary-Value Problem

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Development of a Family of Optimal Quartic-Order Methods for Multiple Roots and Their Dynamics

Empirical Power of Four Statistical Tests in One Way Layout

Hall Effect on Non-commutative Plane with Space-Space Non-commutativity and Momentum-Momentum Non-commutativity

New Iterative Algorithm for Variational Inequality Problem and Fixed Point Problem in Hilbert Spaces

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

When is a nonlinear mixed-effects model identifiable?

Stat 451 Lecture Notes Numerical Integration

Rainbow Connection Number of the Thorn Graph

A new method of nonparametric density estimation

The high order moments method in endpoint estimation: an overview

Stochastic Proximal Gradient Algorithm

STA 4273H: Statistical Machine Learning

Alternate Locations of Equilibrium Points and Poles in Complex Rational Differential Equations

Z. Omar. Department of Mathematics School of Quantitative Sciences College of Art and Sciences Univeristi Utara Malaysia, Malaysia. Ra ft.

A Bootstrap Test for Conditional Symmetry

Skew Cyclic and Quasi-Cyclic Codes of Arbitrary Length over Galois Rings

Integrated Non-Factorized Variational Inference

Likelihood Inference for Lattice Spatial Processes

UNIVERSITÄT POTSDAM Institut für Mathematik

Advanced Studies in Theoretical Physics Vol. 8, 2014, no. 22, HIKARI Ltd,

Bickel Rosenblatt test

Local Polynomial Modelling and Its Applications

When is the Ring of 2x2 Matrices over a Ring Galois?

Convex Sets Strict Separation in Hilbert Spaces

Estimation of Stress-Strength Reliability for Kumaraswamy Exponential Distribution Based on Upper Record Values

Diagonalizing Hermitian Matrices of Continuous Functions

KKM-Type Theorems for Best Proximal Points in Normed Linear Space

Gaussian Copula Regression Application

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Second Hankel Determinant Problem for a Certain Subclass of Univalent Functions

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Transcription:

Int. J. Contemp. Math. Sciences, Vol. 9, 2014, no. 2, 57-64 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijcms.2014.312133 The EM Algorithm for the Finite Mixture of Exponential Distribution Models WANG Yanling and WANG Jixia College of Mathematics and Information Science Henan Normal University, Xinxiang 453007, China Copyright c 2014 WANG Yanling and WANG Jixia. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract In this paper, we first propose a finite mixture of exponential distribution model with parametric functions. By using the local constant fitting, the local maximum likelihood estimations of parametric functions are obtained, and their asymptotic biases and asymptotic variances are discussed. Moreover, we propose the EM algorithm to carry out the estimation procedure and give the ascent property of the EM algorithm. Keywords: finite mixture model; EM algorithm; local maximum likelihood estimation; local constant fitting 1. Introduction Mixture models are widely used in social science and econometrics. The work for mixture models have been well studied, for example, see Lindsay [1. The finite mixture model, which has been applied in various fields, is a useful class of mixture models. A lot of efforts have been made to the finite mixture The Foundation and Frontier Technology Research Projects of Henan Province (122300410396); the Soft Science Project of Henan Province(122400450180); Natural Science Foundation of Henan Educational Committee(2011B110018).

58 WANG Yanling and WANG Jixia models, such as Frühuirth-Schnatter [2, Rossi, Allenby and McCulloch [3, Hurn, Justel and Robert [4 and Huang, Li and Wang [5 and so on. In this paper, we propose a finite mixture of exponential distribution model with parametric functions. We allow that the parametric functions of our models are smooth. In order to estimating unknown parametric functions, we develop the estimation procedure by using the local constant fitting. The local maximum likelihood estimations of parametric functions are obtained via local weighted likelihood method. Local likelihood method(see Tibshirani and Hastie [6) extends the idea of kernel regression. Furthermore, the asymptotic biases and asymptotic variances of local estimations proposed are discussed. Moreover, we propose the EM algorithm to carry out the estimation procedure. For each estimation procedure of parametric functions, it is desirable to estimate the curves over a set of the grid points. We further show that the EM algorithm has the monotone ascent property in an asymptotic sense. The rest of this paper is organized as follows. In Section 2, we introduce our model and propose the Local Maximum Likelihood Estimations of the parametric functions. The EM algorithm of the local estimations is proposed in Section 3. In Section 4 we give the property of our EM algorithm. 2. Models and Local Maximum Likelihood Estimation Now we discuss the local maximum likelihood estimations of the parametric functions p m (x) and λ m (x), m =1, 2,,M. The log-likelihood function for the collected data {(X i,y i ),, 2,,n is [ M log p m (X i )λ m (X i )exp{ λ m (X i )Y i. (2.2) m=1 Note that p m (x) and λ m (x) are parametric functions. In this paper, we will employ the local constant fitting(see Fan and Gijbels [7) for model (2.1). That is, for a given point x, we use local constants p m and λ m to approximate p m (x) and λ m (x), respectively. So the local weighted log-likelihood function for data {(X i,y i ),, 2,,n is [ M l n (p, λ; x) = log p m λ m exp{ λ m Y i K h (X i x), (2.3) m=1 where p =(p 1,p 2,,p M ) T, λ =(λ 1,λ 2,,λ M ) T, K h ( ) =K( /h)/h, K( ) be a nonnegative weighted function and h is a properly selected bandwidth. Let ( p, λ) be the maximizer of the local weighted log-likelihood function (2.3). Then, the local maximum likelihood estimations of p m (x) and λ m (x) are p m (x) = p m, and λ m (x) = λ m, (2.4)

EM algorithm 59 respectively. The asymptotic bias, asymptotic variance and asymptotic normality are studied as the following. Let θ =(p T,λ T ) T and denote M η(y θ) = p m λ m exp{ λ m y, m=1 l(θ, y) = log η(y θ). Furthermore, let θ(x) =(p T (x),λ T (x)) T, and denote [ 2 l(θ(x),y) I(x) = E X = x θ θ T and Λ(u x) = Y l(θ(x),y) η(y θ(u))dy. θ For ξ =1, 2,,M, denote λ m = { λ m λ m.forξ =1, 2,,M 1, denote p m = { p m p m. Let λ =( λ 1, λ 2,, λ M )T, p =( p 1, p 2,, p M 1 )T and θ =(( p ) T, ( λ ) T ) T. Furthermore, Let g( ) is the marginal density function of X, ν 0 (K) = K 2 (z)dz and κ 2 (K) = z 2 K(z)dz. Then, The asymptotic bias and asymptotic variance of θ are { g bias( θ )=I 1 (x)λ u (x) (x x) + 1 g(x) 2 Λ u(x x) κ 2 (K)h 2 and Var( θ )= ν 0(K)I 1 (x), g(x) respectively. Under some regularity conditions, θ has the asymptotic normal distribution. The proofs of above results are similar to that of Theorem 2 in Huang, Li and Wang [5. In this paper, we mainly study the EM algorithm for the finite mixture of exponential distribution model (2.1). 3. The EM Algorithm The mixture problem is formulated as an incomplete-data problem in the EM framework. The observed data (X i,y i )s are viewed as being incomplete, and the unobserved Bernoulli random variables are introduced as following: Z im = { 1, if (X i,y i )is in the mth group 0, otherwise.

60 WANG Yanling and WANG Jixia Let Z i =(Z i1,z i2,,z im ) T, the associated component identity or label of (X i,y i ). Then, {(X i,y i,z i ),i =1, 2,,n are the complete data, and complete log-likelihood function corresponding to (2.2) is M Z im [log p m (X i ) + log λ m (X i ) λ m (X i )Y i. m=1 For x {u 1,u 2,,u N, The set of grid points at which the unknown functions are to be evaluated. We define a local weighted complete log-likelihood function as M Z im [log p m + log λ m λ m Y i K h (X i x). m=1 Note that Z im s do not depend on the choice of x. We have λ (l) m ( ) and p (l) m ( ) in the lth cycle of the EM algorithm iteration. Then, in the E-step of (l + 1)th cycle, the expectation of the latent variable Z im is given by r (l+1) im = p m (l) (X i )λ m (l) (X i ) exp{ λ m (l) (X i )Y i M m=1 p(l) m (X i )λ m (l) (X i ) exp{ λ m (l) (X i )Y i. (3.1) In the M-step of the (l + 1)th cycle, we maximize M m=1 r (l+1) im [log p m + log λ m λ m Y i K h (X i x), (3.2) for x = u j,j =1, 2,,N. The maximization of equation (3.2) is equivalent to maximizing M r (l+1) im [log p m K h (X i x) (3.3) and for m =1, 2,,M m=1 m=1 M r (l+1) im [log λ m λ m Y i K h (X i x), (3.4) separately. For x {u 1,u 2,,u N, the solution for maximization of equation (3.3) is n p m (l+1) (x) = r(l+1) im K h(x i x) n K, (3.5) h(x i x) and the solution for maximization of equation (3.4) is n λ m (l+1) (x) = r(l+1) im K h(x i x) n r(l+1) im Y ik h (X i x). (3.6)

EM algorithm 61 Furthermore, we update p (l+1) m interpolating p (l+1) m (u j ) and λ (l+1) m (X i ) and λ m (l+1) (X i ), i =1, 2,,n by linearly (u j ), j =1, 2,,N, respectively. We summarize the EM algorithm as the following. The EM algorithm Initial Value: Conduct a mixture of polynomial regressions with constant proportions and variance, and obtain the estimations of mean function μ m (x) and parameters p m. Set the initial values λ (1) m (x) =1/ μ m (x) and p (1) E-step: use equation (3.1) to calculate r (l+1) im 1, 2,,M. m (x) = p m. for i =1, 2,,n and m = M-step: For m =1, 2,,M and j =1, 2,,N, evaluate p m (l+1) (u j ) in (3.5) and λ m (l+1) (u j ) in (3.6). Further, we obtain p m (l+1) (X i ) and λ m (l+1) (X i ) using linear interpolation. Iteratively update the E-step and the M-step with l =2, 3,, until the algorithm converges. It is well known that the bandwidth selection can be tuned to optimize the performance of the estimated parametric functions. At the end of this section, we select the bandwidth of the local estimations for the parametric functions. We select bandwidth h via the Cross-validation method, which is discussed in detail in Fan and Gijbels [8. 4. Property of the EM Algorithm In this section, we study the EM algorithm we proposed preserves the ascent property. We first give the following Assumptions. (A1) The sample {(X i,y i ),i =1, 2,,n is independent and identically distribution from (X, Y ), and the support for X, denoted by χ, is a compact subset of R. (A2) The marginal density function g(x) ofx is twice continuously differentiable and positive for all x χ. (A3) there exists a function M(y) with E[M(y) <, such that for all y, and all θ in a neighborhood of θ(x), we have 3 l(θ,y) θ j θ k θ l <M(y). (A4) The parametric function θ(x) has continuous second derivatives. Furthermore, for m =1, 2,,M, λ m > 0 and p m > 0 hold for all x χ. (A5) The kernel function K( ) has a bounded support and satisfies that K(z)dz = 1, zk(z)dz = 0, z 2 K(z)dz <, K 2 (z)dz < and K 3 (z) dz <. Let θ (l) =(p (l) ( ),λ (l) ( )) be the estimated functions in the lth cycle of the EM algorithm proposed. The local weighted log-likelihood function (2.3) is

62 WANG Yanling and WANG Jixia rewritten as l n (θ) = l(θ, Y i )K h (X i x). (4.1) Then, we have the following theorem. Theorem 4.1 Assume that conditions (A1)-(A5) hold. For any given point x, suppose that θ (l) ( ) has a continuous first derivative, and h 0 and nh as n. Then, we have lim inf n 1 [ ln (θ (l+1) (x)) l n (θ (l) (x)) 0 (4.2) n in probability. Proof Assume that the unobserved data {ξ i,i = 1, 2,,n is a random sample from population ξ. Then, the complete data {(X i,y i,ξ i ),i = 1, 2,,n can be viewed as a sample from (X, Y, ξ). Let h(y, m θ(x)) be the joint distribution of (Y,ξ) given X = x, and g(x) be the marginal density of X. Conditioning on X = x, Y follows a distribution η(y θ(x)). Then, the local weighted log-likelihood function (2.3) can be rewritten as l n (θ) = The conditional probability of ξ = m given y and θ is log[η(y i θ)k h (X i x). (4.3) f(m y, θ) =h(y, m θ)/η(y θ) =p m λ m exp{ λ m y/ M p m λ m exp{ λ m y. (4.4) For given θ (l) (X i ),i =1, 2,,n, it is clear that f(m Y i,θ (l) (X i ))dm =1. Then, we have l n (θ) = = { By equation (4.4), we have [ log[η(y i θ) m=1 f(m Y i,θ (l) (X i ))dm K h (X i x) log[η(y i θ)[f(m Y i,θ (l) (X i ))dm K h (X i x). (4.5) log[η(y i θ) = log[h(y i,m θ) log[f(m Y i,θ). (4.6) Thus, we have l n (θ) = [ log[η(y i θ) f(m Y i,θ (l) (X i ))dm K h (X i x)

EM algorithm 63 = { { log[h(y i,m θ)[f(m Y i,θ (l) (X i ))dm K h (X i x) log[f(m Y i,θ)[f(m Y i,θ (l) (X i ))dm K h (X i x), (4.7) where θ (l) (X i ) is the estimation of θ(x i ) at the lth iteration. Taking expectation leads to calculating equation (3.1). In the M-step of the EM algorithm, we update θ (l+1) (x) such that 1 n { 1 { n It suffices to show that 1 { log n in probability. Let where log[h(y i,m θ (l+1) (x))[f(m Y i,θ (l) (X i ))dm K h (X i x) log[h(y i,m θ (l) (x))[f(m Y i,θ (l) (X i ))dm K h (X i x). [ f(m Yi,θ (l+1) (x)) f(m Y i,θ (l) (x)) φ(y i,x i )= L n1 = 1 n log f(m Y i,θ (l) (X i ))dm K h (X i x) 0 (4.8) φ(y i,x i )K h (X i x), [ f(m Yi,θ (l+1) (x)) f(m Y f(m Y i,θ (l) i,θ (l) (X i ))dm. (x)) By using assumptions (A1)-(A4), we have f(m Y,θ (l) (x)) >a>0 for some small value a, and E{[φ(Y,X) 2 <. Then, by assumption (A5) and theorem A by Mack and Silverman [9, we have sup L n1 g(x)e[φ(y,x) = o p (1), J where J is a compact interval on which the density of X is bounded below from 0. The proof follows that { [ f(ξ Y,θ (l+1) (x)) E[φ(Y,x) = E log f(m Y,θ (l) (x))dm f(ξ Y,θ (l) (x)) E { log [ [ f(ξ Y,θ (l+1) (x)) f(ξ Y,θ (l) (x)) f(m Y,θ (l) (x))dm.

64 WANG Yanling and WANG Jixia The proof of theorem 4.1 is completed. References [1 Lindsay, B. Mixture Models: Theory, Geometry and Applications[M. Hayward, CA: Institute of Mathematical Statistics, 1995. [2 Frühwirth-Schnatter, S. Markov Chain Monte Carto estimation of classical and dynamic switching and mixture models[j. Journal of the American Statistical Association, 2001, 96: 194-209. [3 Rossi, P., Allenby, G. and McCulloch, R. Bayesian Statistics and Marketing[J. Chichester: Wiley, 2005. [4 Hurn, M., Justel, A. and Robert, C. Estimating mixtures of regressions[j. Journal of Computational and Graphical Statistics, 2003, 12:55-79. [5 Huang, M., Li, R. and Wang, S. Nonparametric mixture of regression models[j. Journal of the American Statistical Association, 2013, 108:929-941. [6 Tibshirani, R. and Hastie, T. Local likelihood estiamtion[j. Journal of the American Statistical Association, 1987, 82: 559-567. [7 Fan,J and Gijbels,I. Local Polynomial Modelling and Its applications[m. London: Chapman and Hall,1996. [8 Fan, J. and Gijbels,I. Variable bandwidth and local linear regression[j. Annals of Statistics, 1992, 20: 2008-2036. [9 Mack, Y., Silverman, B. Weak and strong uniform consistency of kernel regression estimates[j. Probability Theory and Related Fieids, 1982, 61: 405-415. Received: December 1, 2013