Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data

Size: px

Start display at page:

Download "Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data"

Patrick Boone
6 years ago
Views:

1 Sankhyā : The Indian Journal of Statistics 2009, Volume 71-B, Part 1, pp c 2009, Indian Statistical Institute Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data M.R.I. Chowdhury and B.C. Sutradhar Memorial University of Newfoundland, Canada Abstract Inferences for the regression parameters and the variance of the random effects in the generalized linear mixed models (GLMMs) set up, is an extremely important statistical issue. It is however known that the most widely used penalized quasi-likelihood (PQL) approach may not produce consistent estimates for the parameters, especially when the true variance of the random effects is large. In the context of Poisson mixed models, in this paper, we examine the consistency performances of two other competitive estimation approaches, namely, the hierarchical likelihood (HL) and the generalized quasi-likelihood (GQL) approaches. An extensive simulation study shows that the HL approach, similar to the PQL approach, appears to produce highly biased and hence inconsistent estimates for the regression parameters, especially when the variance of the random effects is large. The biases of the HL estimates also appears to vary depending on the cluster sizes. As an alternative, the GQL approach appears to produce consistent estimates for all parameters of this model irrespective of the size of the cluster and the magnitude of the variance of the random effects. The GQL and HL estimates are also compared in a real life data analysis. AMS (2000) subject classification. Primary 62F10; secondary 62H20. Keywords and phrases. GLMMs, method of moments approach, penalized quasi-likelihood approach, hierarchical likelihood approach, generalized quasilikelihood approach, relative bias, zero-inflated poisson mixed model. 1 Introduction In general, whether the responses are count or binary, the generalized linear mixed models (GLMMs) for such responses are generated from the well-known generalized linear model (GLM) (McCullagh and Nelder, 1989) by adding random effects to the linear predictor. A random variable Y ij

2 56 M.R.I. Chowdhury and B.C. Sutradhar for the j th member (j = 1,...,n i ) of the i th family (i = 1,...,K), with exponential density f (y ij ) = exp[{y ij η ij a(η ij )} + b(y ij )] (1.1) follows a GLM when η ij has a linear form, namely, η ij = x ij β where x ij = (x ij1,...,x iju,...,x ijp ) is the vector of covariates and β is the vector of regression parameters. In (1.1), a(.) and b(.) are known functions. Note that the exponential form (1.1) contains both the Poisson and binary distributions. As far as the GLMM is concerned, it is developed by adding random effects, say γ i to the linear predictor η ij, where γ i for i = 1,...,K are generally assumed to be independently and identically distributed (i.i.d) with mean 0 and variance σ 2 i.i.d. In notation, we use γ i (0,σ 2 ). Thus in the GLMM set-up, the binary and count responses follow (1.1) with a(η ij ) = exp (η ij ) in the Poisson case, and a(η ij ) = exp(η ij )/(1 + exp (η ij )) in the binary case, where η ij = x ij β + σ z i γi. Here, z i is a random effects related known covariate for the ith cluster, and γi = γ i (0,1). In this paper, σ to be specific, we will deal with the GLMMs for count data only. Note that the GLMMs for the count and binary data with two or more random effects have also been discussed in the literature (see Lin and Breslow (1996), for example) but in the present paper, without any loss of generality, we confined our analysis to the GLMMs (1.1) with a single random effect. One of the main reasons for the selection of such simple models in this paper is that our objective is not to analyze any higher dimensional complicated model, but, to compare the relative performances of two highly competitive estimation approaches in estimating the parameters of simpler models. Under the GLMMs setup, it has proven to be difficult to obtain consistent and efficient estimators for the regression parameters (β) and the variance of the random effects (σ 2 ). A full or exact likelihood analysis is complicated as it requires a complex integration over the distribution of the random effects. This integration problem compels one to avoid the exact likelihood estimation method, even though it is known that maximum likelihood estimators will be fully efficient (optimal). To overcome this computational problem, many authors, over the last two decades, have used various approximate methods to obtain consistent estimates for the regression effect β and the variance of the random effects σ 2. For example, we refer to some of the leading approaches such as the penalized quasi-likelihood (PQL) approach of Breslow and Clayton (1993), Breslow and Lin (1995), Kuk (1995), and i.i.d

3 Generalized quasi-likelihood vs. hierarchical likelihood 57 Lin and Breslow (1996), hierarchical likelihood (HL) approach due to Lee and Nelder (1996, 2001), and the recent generalized quasi-likelihood (GQL) approach of Sutradhar (2004), Jiang (1998), Jiang and Zhang (2001), and Sutradhar and Rao (2001). Note that in both PQL and HL approaches the estimation of β and σ 2 requires the estimation of γ i as well; which are estimated by pretending that γ i s are fixed parameters even though they are truly unobservable random effects. To be specific, in the first step, both PQL and HL approaches estimate β and γ i. The difference between the two approaches is that the PQL approach estimates β and γ i by maximizing a penalized quasi-likelihood function, whereas the HL approach maximizes the hierarchical likelihood function h = log K n i j=1 f(y ij γ i ) + log K h (γ i ;σ 2 ), (1.2) to estimate these parameters, where h (γ i ;σ 2 ) is the likelihood function of unobserved γ i s, and f(y ij γ i ) is the conditional density function of the response y ij given γ i. In the second step, in estimating σ 2, the PQL approach maximizes a profile quasi-likelihood function, whereas the HL approach maximizes an adjusted profile hierarchical likelihood function. Further we remark that the PQL approach of Breslow and Clayton (1993) may however not provide consistent estimates for the parameters, mainly for the variance component σ 2 [Sutradhar and Qu (1998)]. To be specific, Sutradhar and Qu (1998) have argued that the PQL approach may provide inconsistent estimate for σ 2 when cluster sizes (n i ) are small, specially under the situations where σ 2 is large such as σ Breslow and Lin (1995, p.20) also have reported similar consistency problem for the estimates of σ 2. It was in particular argued by these authors that the PQL approach produces consistent estimates for the cases when σ 2 is small such as σ Lin and Breslow (1996) attempted to produce bias corrected estimates, but their bias corrections also do not appear to improve the consistency results significantly when σ 2 is large. Consequently, we do not follow the PQL approach any further in the present paper. Note that as the PQL and the HL approaches estimate β and σ 2 through the estimation of γ i (i = 1,...,K), the HL approach may also suffer from

4 58 M.R.I. Chowdhury and B.C. Sutradhar similar inconsistency problems due to similar reasons that cause inconsistency in the PQL approach. There does not however exist any comparative study between the HL and any other competitive approaches. Further note that as an alternative to the PQL and other moments based (Jiang and Zhang (2001)) approaches, Sutradhar (2004) has recently proposed a GQL approach that produces consistent as well as more efficient estimates. The purpose of this paper is to make a comparative study between the GQL and the HL approaches. Both of these HL and GQL approaches are reviewed briefly in Section 2. An extensive simulation study is conducted in Section 3 for various small and large values of σ 2, as well as for small and large cluster sizes such as n i = 2, 4 and 6. The simulation results show that the HL approach estimates the parameters with larger biases as compared to the GQL approach, the performance of the HL approach is being relatively worse for the estimates of the main regression parameter β. Both of the GQL and HL estimation approaches are illustrated in Section 4 by analyzing a health care utilization data. The paper is concluded in Section 5. 2 Estimation of Parameters: GQL versus HL Inferences 2.1. GQL approach. Let y i = (y i1,...,y ij,...,y ini ) and µ i = E (Y ij ) = (µ i1,...,µ ij,...,µ ini ), where µ ij = E γi E(Y ij γ i ) = E γi (µ ij ) with µ ij = exp (x ij β + z iγ i ) as the conditional mean of the Poisson distribution computed from (1.1). Further let, Σ i = (σ ijk ) be the n i n i covariance matrix of y i. It can be shown that σ ijj = V ar (Y ij ) = µ ij + [exp(z 2 i σ 2 ) 1]µ 2 ij for j = 1,...,n i, and for j,k = 1,...,n i,j k, σ ijk = Cov (Y ij,y ik ) = µ ij µ ik [exp(z 2 i σ 2 ) 1]. For given σ 2, one may then write the GQL estimating equation [Sutradhar (2004)] for β as K µ i β Σ 1 i (y i µ i ) = 0, (2.1) (see Wedderburn (1979) and McCullagh (1983) for the independence case) where µ i β = X ia i, with X i = [x i1,...,x ini ] and A i = diag[µ i1,...,µ ini ].

5 Generalized quasi-likelihood vs. hierarchical likelihood 59 Note that as the first order responses are used to construct (2.1), we refer to this equation as the first order GQL estimating equation. Once the regression effects β is estimated by solving (2.1), one may construct a second order GQL estimating equation to estimate σ 2. Let g i = [y 2 i1,...,y 2 in i,y i1 y i2,...,y ini 1 y ini ]. (2.2) be the n i (n i + 1)/2 1 vector of all distinct second order responses for the ith cluster. Let λ i = E(G i ) and Ω i = Cov (G i ) be the mean vector and covariance matrix of g i respectively under the Poisson mixed model. It then follows that for a given value of β, the variance component σ 2 may be estimated by solving the second order GQL estimating equation K λ i σ 2 Ω 1 i (g i λ i ) = 0. (2.3) [Sutradhar (2004), Jiang and Zhang (2001)]. Note that the λ i vector in (2.3) is easily calculated. This is because, E(Yij 2) = V ar(y ij) + µ 2 ij for all j = 1,...,n i ; and E(Y ij Y ik ) = Cov(Y ij,y ik ) + µ ij µ ik, for all j k,j,k = 1,...,n i. The derivative vector λ i may also be easily computed. The σ2 computation for the fourth order moments matrix Ω i is, however, lengthy, but straightforward. All elements of this matrix may be computed by using the conditioning and unconditioning principle as it was done for the computation of the variances and covariances. For example, the variance of Yij 2 may be computed as V ar(yij 2 ) = E(Y ij 4 2 ) [E (Yij )]2 = E γi E(Yij 4 γ i) [E (Yij 2 )]2 = µ ij + µ 2 ij (7ez2 i σ2 1) + 2µ 3 ij ez2 i σ2 (3e 2z2 i σ2 1) +µ 4 ij e 2z2 i σ2 (e 4z2 i σ2 1), (2.4) and the covariance between Y 2 ij and Y iky il for j k < l may be computed as Cov(Y 2 ij,y iky il ) = E γi E (Y 2 ij Y ik Y il γ i ) E γi [E (Y 2 ij γ i)e (Y ik Y il γ i )] = µ ij µ ik µ il e z2 i σ2 (e 2z2 i σ2 1)+µ 2 ij µ ikµ il e 2z2 i σ2 (e 4z2 i σ2 1).(2.5)

6 60 M.R.I. Chowdhury and B.C. Sutradhar We remark that as E(Y i ) = µ i and E(G i ) = λ i, the first (2.1) and the second order (2.3) GQL estimating equations are naturally unbiased. This unbiasedness property yields consistent estimates for both β and σ 2. Furthermore, as the covariance matrices Σ i and Ω i are appropriate weight matrices in the estimating equations, the GQL estimates of β and σ 2 will also be more efficient than any other unbiased such as moment estimators derived without exploiting the weight matrices. See, for example, Jiang (1998), Jiang and Zhang (2001), and Sutradhar (2004). We further remark that the GQL estimation of β and σ 2 by (2.1) and (2.3) does not require any estimates for the random effects γ i (i = 1,2,...,K), whereas the existing leading analytical approaches such as the PQL approach of Breslow and Clayton (1993) and the HL approach of Lee and Nelder (1996) need to estimate these random effects by pretending that they are fixed effects. This is, however, known [Sutradhar and Qu (1998), Jiang (1998)] that the PQL approach for the estimation of β and σ 2 through the estimation of γ i appears to encounter convergence problems, especially for the estimation of σ 2. Similar to the PQL approach, as the HL approach exploits the estimates of γ i for the estimation of β and σ 2, the HL approach may also encounter similar inconsistency problems as that of the PQL approach. But, even though, the HL approach has been illustrated by analyzing various numerical examples in Lee and Nelder (1996, 2001), there does not appear any empirical study to examine the consistency property of the HL estimates for β and σ 2. In Section 3, we conduct an extensive simulation study to examine whether HL estimation satisfy this important property. The HL estimating equations are overviewed in the next subsection. Note that we also include the GQL approach in the simulation study in order to examine the relative performances of the HL estimators as compared to those of the GQL estimators. The GQL and HL approaches are also illustrated by analyzing a real life data. This is done in Section HL approach. In the present set up, that is, under the Poissonnormal mixed model, the HL function (1.2) may be expressed as h = N [ n i j=1 y ij log(µ ij ) K n i j=1 K 2 log(2π) + K 2 log(σ2 ) + 1 2σ 2 N µ ij n i log (y ij!) K γ 2 i ] j=1, (2.6)

7 Generalized quasi-likelihood vs. hierarchical likelihood 61 where µ ij = exp(x ij β + z iγ i ). In this approach, for given σ 2, similar to the PQL approach, one estimates β and γ i by the maximizing the likelihood function in (2.6). More specifically, by writing µ i = [µ i1,...,µ in i ], one solves the HL estimating equations h β = 0 and h γ i = 0, (2.7) for β and γ i (i = 1,...,K), respectively. For the estimation σ 2, Lee and Nelder (1996) have, however maximized a general adjusted profile h-likelihood defined by h A = h log {det(2π ϕh 1 )} = h log (2π) 1 2 log {det(h)}, (2.8) where ϕ=1 under the Poisson model, and H matrix is defined as [ X H = WX X WZ Z WX Z WZ + U ] (p+k) (p+k) where X = [X 1,...,X i,...,x K ] : n i p, with X i as the n i p covariate matrix; W and Z are block-diagonal matrices given by W = K A i : ni n i and Z = K 1 n i : ni K, respectively, with A i = diag[µ i1,...,µ in i ]: ni n i and 1 ni = (1,...,1) : ni 1 ; and U = 1 I σ 2 K, I K being the K K identity matrix. The maximization is achieved by using the iterative equation given by ˆσ 2 (r+1) = ˆσ2 (r) + [ ( 2 h A σ 4 ) 1 ] ha σ 2, (2.9) (r) where the square bracket [ ] (r) indicates that the quantity in [ ] is evaluated at σ 2 = ˆσ (r) 2, r being the rth iteration.

8 62 M.R.I. Chowdhury and B.C. Sutradhar Let ˆβ GQL and ˆσ GQL 2 be the solutions of the GQL estimating equations (2.1) for β and (2.3) for σ 2, respectively. Similarly, let ˆβ HL and ˆσ HL 2 be the solutions of the HL based estimating equations (2.7) for β, and (2.9) for σ 2, respectively. 3 A Simulation Study In this section, we examine the relative performances of the HL estimators, ˆβHL and ˆσ HL 2, to the corresponding GQL estimators, ˆβGQL and ˆσ GQL 2, through a simulation study consisting of 500 simulations. Under each simulation, for given covariates and selected values of the design parameters, we generate the data {y ij ; j = 1,...,n i, i = 1,...,K} by using the Poisson form of (1.1) with mean parameters µ ij = E (Y ij γ i ) = exp(β 1 x ij1 + β 2 x ij2 + σ γ i ) (3.1) where γi = γ i iid N(0,1). We choose the cluster number as K = 100 and σ cluster sizes n i = 2, 4, 6, 10 and 16. Note that the cluster sizes are chosen to accommodate both small and large clusters. For regression parameters we use β 1 = β 2 = 1.0. As far as the values of the fixed covariates x ij = (x ij1, x ij2 ) are concerned, we choose x ij1 = 1 for j = 1,2,...,n i /2; i = 1,2,...,K/2 0 for j = n i /2 + 1,...,n i ; i = 1,2,...,K/2 1 for j = 1,...,n i ; i = K/2 + 1,...,K 1 for j = 1,2,...,n i /2; i = 1,2,...,K/2 2 for j = n i /2 + 1,...,n i ; i = 1,2,...,K/2 x ij2 = 0 for j = 1,2,...,n i /2; i = K/2 + 1,...,K 1 for j = n i /2 + 1,...,n i ; i = K/2 + 1,...,K

9 Generalized quasi-likelihood vs. hierarchical likelihood 63 With regard to the selection of the variance of the random effects, we choose σ 2 = 0.4, 0.8, and 1.2. We remark that even though in theory the overdispersion index parameter σ 2 can take any value from 0 to, for practical purposes σ appears to be quite large. This is because under the Poisson-normal mixed model (3.1), the overdispersion in the count data may increase significantly even if the increment in σ 2 is small. To be specific, the variance of y ij, σ ijj = µ ij + [exp(σ 2 ) 1]µ 2 ij, under the Poisson-normal mixed model increases significantly, depending on the value of the mean function µ ij = exp(x ij β σ2 ), even if σ 2 changes from 1.0 to 1.2, for example. We further remark that Breslow and Lin (1995, P. 90) were able to obtain unbiased estimate of this overdispersion index parameter σ 2 when σ 2 ranges up to 0.5 only. Next, under each simulation, the simulated values of {y ij } along with the values of the covariates {x ij } are used to obtain the GQL estimates of β and σ 2 by solving (2.1) and (2.3), respectively, and the HL estimates of β and σ 2 by using (2.7) and (2.9), respectively. Note that under the HL approach, we also had to estimate γ i (i = 1,...,100) by treating them as the fixed parameters, but these estimates will not be reported as they are not of direct interest. The simulated means (SMs), simulated standard errors (SSEs) for each of the above four estimates are shown in Table 1 for all selected cluster sizes n i = 2, 4, 6, 10 and 16.

10 64 M.R.I. Chowdhury and B.C. Sutradhar Table 1. Comparison of simulated means (SMs), standard errors (SSEs), and relative bias (SRB) of the regression estimates and estimates of variance of random effects by the GQL and HL approaches, under the Poisson-normal mixed model, for selected values of σ 2 : K = 100; n i=2, 4, 6, 10 and 16 (i = 1,2,..., K); true values of the regression parameters: β 1 = 1.0 and β 2 = 1.0 Cluster Estimates size σ 2 Method Quantity ˆβ1 ˆβ2 ˆσ GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB

11 Generalized quasi-likelihood vs. hierarchical likelihood 65 Table 1. (Continued) Cluster Estimates size σ 2 Method Quantity ˆβ1 ˆβ2 ˆσ GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB

12 66 M.R.I. Chowdhury and B.C. Sutradhar Table 1. (Continued) Cluster Estimates size σ 2 Method Quantity ˆβ1 ˆβ2 ˆσ GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB

13 Generalized quasi-likelihood vs. hierarchical likelihood 67 Table 2. Comparison of simulated means (SMs), standard errors (SSEs), and relative bias (SRB) of the regression estimates and estimates of variance of random effects by the GQL approach, under the Poisson-gamma mixed model, for selected values of σ 2 : K = 100; n i=4 and 10 (i = 1,2,..., K); true values of the regression parameters: β 1 = 1.0 and β 2 = 1.0 Cluster Estimates size σ 2 Quantity ˆβ1 ˆβ2 ˆσ SM SSE SRB SM SSE SRB SM SSE SRB SM SSE SRB SM SSE SRB SM SSE SRB Note that when an estimate becomes highly biased with small standard error, it turns to be an useless estimate. For this reason, to check the actual convergence of the estimates to their corresponding parameter values, we have also computed the simulated relative bias (SRB) given by SRB = SM True parameter value SSE 100. These SRBs are also reported in the same Table for all cluster sizes and selected values of the overdispersion index parameter Relative performance of GQL and HL approaches. With regard to the estimation of β 1 and β 2, it is clear from the simulation results displayed in Table 1 that the GQL approach always produces the regression estimates with smaller relative bias as compared to the HL approach. This better performance of the GQL approach appears to hold for all small and large cluster sizes n i = 2, 4, 6, 10 and 16; as well as for all small and large values

14 68 M.R.I. Chowdhury and B.C. Sutradhar of σ 2 = 0.4, 0.8 and 1.2. For example, when n i = 2, the GQL estimates of β 1 and β 2 are slightly biased with SRBs 33 and 52 when σ 2 = 0.4, and respective SRBs are 23 and 31 when σ 2 = 1.2. But, the HL estimates for the same regression parameters appear to converge to wrong values with small standard errors. To be specific, for n i = 2, the HL estimates of β 1 and β 2 appear to have SRBs 195 and 253 when σ 2 = 0.4, and 921 and 764 when σ 2 = 1.2. Similarly, for a large cluster size n i = 16, the SRBs for the GQL estimates are found to be 10 and 12 when σ 2 = 0.4, and 11 and 11 when σ 2 = 1.2; whereas the respective SRBs for the HL approach are found to be 699 and 994 when σ 2 = 0.4, and 326 and 306 when σ 2 = 1.2. Thus, irrespective of the cluster size n i and the value of σ 2, the GQL approach performs much better than the HL approach in estimating β 1 and β 2. Note that the results in Table 1 also reveal that the pattern of the regression estimates as a function of n i and σ 2 is different under the HL approach as compared to the GQL approach. When cluster size is small such as, n i = 2, 4 and 6, the SRBs of the regression estimates get smaller for the GQL estimates but they get larger for the HL estimates, as the value of σ 2 increases. When cluster size is large such as n i = 10 and 16, the performance of the GQL estimates of the regression parameters remains the same irrespective of the value of σ 2, whereas the HL estimates appear to perform better as the value of σ 2 increases. But, when HL and GQL approaches are compared, as mentioned above, the GQL approach performs uniformly better in estimating β 1 and β 2. With regard to the estimation of the overdispersion parameter σ 2, the GQL approach, in general, appears to perform better than the HL approach. For example, when n i = 6, the GQL approach produces σ 2 estimates with SRBs 14 and 23 for σ 2 = 0.4 and 1.2, respectively; whereas the corresponding SRBs for the HL estimates are found to be 65 and 196, respectively. 3.2.Performance of the GQL approach under negative binomial set-up. In the simulation study in the last section, it was found that the GQL approach performed uniformly better than the HL approach in estimating both regression and overdispersion parameter. But, the comparison was made under the Poisson-normal mixed model discussed in Section 2. In this section, we conducted another simulation study by generating the random effects γ i from a gamma distribution with parameters 1/(c exp(σ 2 /2)) and 1/c with c = exp(σ 2 ) 1. To be specific, we generate γ i from the gamma distribution with density

15 Generalized quasi-likelihood vs. hierarchical likelihood 69 f(γ i ) = [{c exp(σ 2 /2)} 1/c Γc 1 ] 1 exp[ {c exp(σ 2 /2)} 1 γ i ]γ c 1 1 i, (3.2) (Jowaheer, 2006) and then generate y ij from the Poisson distribution (Poisson form of (1.1)) with mean µ ij = exp (x ij β + γ i). It then follows that y ij unconditionally has a negative binomial distribution with the same mean and the same variance as in the Poisson-normal model considered in Section 3.1. But, the higher order moments are different under these two models. In the present simulation study we use the same covariates and the same values for the design parameters as in the previous simulation study in Section 3.1. For simplicity, we have reported the simulation results in Table 2 for a small cluster size n i = 4 and for a relatively large cluster size n i = 10. All three values of σ 2 were considered. It is clear from the results in Table 2 that the GQL approach continue to performs well in estimating the parameters even under the present negative binomial model. 4 A Numerical Illustration: Health Care Utilization Data Analysis In the last section it was shown through a simulation study that the HL approach produces highly biased estimates with smaller standard errors as compared to the GQL approach. Thus the GQL approach performs better than the HL approach in estimating all parameters of the Poisson familial model. The purpose of this section is to analyze a real life data and interpret the estimates reflecting the simulation behavior. With regard to the selection of the data, we have chosen to analyze a set of count data collected by a general hospital in Canada that contains responses and associated covariates from a moderately large number of independent families. The data set is described in subsection Health care utilization data. The Department of Community Medicine, Health Science Center (General Hospital) in St. John s, Canada, collected a familial-longitudinal data on health care utilization for K = 48 families over a period of 6 years for 1985 to Note that some authors such as Sutradhar, Jowaheer and Sneddon (2008) have used the GQL approach and analyzed this complete familial-longitudinal data for four years collected from 48 families. Our purpose is, however, to concentrate on the familial data analysis only as an application of the theoretical and simulation results discussed in Sections 2 and 3.

16 70 M.R.I. Chowdhury and B.C. Sutradhar For our purpose, we consider a part of this whole data set, for 1985 only. Among these 48 families, 36 are of size n i = 4 (i = 1,...,36), and the remaining 12 are of size n i = 3 (i = 37,...,48). Each of the family members was asked about the number of visits they paid to a physician during In our notation, y ij denote this count response for the jth (j =1,...,n i ) of the ith family (i = 1,...,48). Their gender (x ij1 ), the chronic condition (x ij2 ) [CC], education level (x ij3 ) [EL] and age of the individual (x ij4 ), were also collected. For convenience, we use the following codes for these four covariates: x ij1 = { 0 female 1 male x ij2 = { 0 without chronic diseases 1 with chronic diseases x ij3 = { 0 less than high school 1 high school or above x ij4 = exact age of the individual To have a feel for this familial data, the distribution of the count responses, from 180 individuals, by all four covariates is displayed in Table 3. Table 3. Summary Statistics of Physician Visits by Four Covariates in the Health Care Utilization Data for Number of Visits Covariates Level Total Gender Male Female Chronic Condition No Yes Education Level < High School High School Age It is seen from Table 3 that, in general, more males appear to visit a physician a smaller number of times, while a large number of females visit a physician at least 3 times. As expected, we see that an individual with

17 Generalized quasi-likelihood vs. hierarchical likelihood 71 chronic diseases visits a physician more often. Physician visits for individuals with a higher level of education seems to be evenly distributed, i.e. individuals are just as likely to visit a physician once as 3-5 times. For those with lower level of eduction, they appear to either not visit a physician, or visit a large number of times. With regard to the relationship between number of visits and age, we have temporarily made 5 age groups and observed that some of the individuals in the age group have visited a physician a large number of times. As expected, a large number of individuals did not visit a physician at all. For older age groups, there was a tendency for an individual to see a physician more often. In the next section, we examine the effects of the above four covariates on the responses by applying both GQL and HL approaches discussed in Section 2. In our notation, β = (β 1,β 2,β 3,β 4 ) denote their effects. Note however that when we computed the mean and the variance of the 180 count responses, it was found that an individual on the average has visited his/her physician 3.92 times with variance This indicate that the count responses contain over-dispersion which in our notation is represented by the over-dispersion index parameter σ 2, the variance of the random family effects. We also need to estimate this variance parameter along with the estimate of β GQL and HL estimates. The GQL estimates of β and σ 2 were obtained by solving the equation (2.1) and (2.3) respectively. The standard errors of the estimates of the components of β were calculated from the asymptotic covariance matrix of ˆβ GQL, given by Cov(ˆβ GQL ) = [ K µ i µ i β Σ 1 i β ] 1, and similarly, the standard error of ˆσ 2 GQL was computed from V ar(ˆσ 2 GQL) = [ K λ i λ i σ 2 Ω 1 i σ 2 ] 1. Next, the HL estimates of β and σ 2 were obtained from (2.7) and (2.9), respectively. The standard errors of the components of β were computed from

18 72 M.R.I. Chowdhury and B.C. Sutradhar Cov(ˆβ HL ) = [ K ] 1 X i diag [µ i1,...,µ in i ]X i, and similarly the standard error of ˆσ HL 2 was computed by using the so-called Hessian scalar [ ] 1 V ar(ˆσ HL 2 ) = 2 h A σ 4. These estimates along with their standard errors are displayed in Table 4. Table 4. The GQL, PGQL, ZIGQL, HL, PHL and MM Estimates along with available standard errors, for the Health Care Utilization Data for Effects of the Covariates Variance Method Quantity Gender(ˆβ 1) CC(ˆβ 2) EL(ˆβ 3) Age(ˆβ 4) ˆσ 2 GQL Value SE PGQL Value SE ZIGQL Value SE HL Value SE PHL Value SE MM Value SE Note that even though the method of moments [Jiang (1998)] is known to produce inferior estimates than the GQL approach [Sutradhar (2004)], for the sake of completeness, in this section, we also provide the moment estimates for both β and σ 2. To be specific, the moment estimate (ˆβ MM ) of β is obtained by solving K X i (y i µ i ) = 0, (4.1)

19 Generalized quasi-likelihood vs. hierarchical likelihood 73 with its asymptotic variance computed from Cov(ˆβ MM ) = [ K ] 1 X i diag [µ i1,...,µ ini ]X i, where µ i in (4.1) is the unconditional mean vector. As far as the moment estimate of σ 2 is concerned, we obtain ˆσ MM 2 by solving where [S E(S)] = 0 (4.2) S = N n i j=1 y 2 ij + N n i j<k y ij y ik, and E (S) = N n i [µ ij + µ 2 ij eσ2 ] + j=1 N n i j<k µ ij µ ik e σ2, with µ ij = exp (x ij β σ2 ). The MM estimate of β along with the standard errors of the components of ˆβ MM are shown in the same Table 4. The value of ˆσ MM 2 is also given. We also provide the estimates for β and σ 2 based on a partial GQL (PGQL) as well as partial HL (PHL) approach. To be specific in the PGQL approach, β and σ 2 are iteratively estimated by using their formulas computed based on the GQL and the MM approaches, respectively. Similarly, the PHL estimates of β and σ 2 are iteratively estimated by using their formulas computed based on the HL and the MM approaches, respectively. These estimates are also reported in Table 4. The results of the Table 4 show that the estimates for β and σ 2 are generally different under the GQL and the HL approaches. Furthermore, except for the estimate of σ 2, the HL estimates appear to be closer to the MM estimates than the GQL estimates. Since the GQL estimates are known

20 74 M.R.I. Chowdhury and B.C. Sutradhar to be better than the MM estimates [Sutradhar (2004)], and also because the GQL estimates reported based on the simulation study in Section 3, were found to be less biased than the HL estimates, one would naturally find the GQL estimates in Table 4 more reliable than the HL estimates. Note that the estimated standard errors under the HL approach are found to be smaller than the GQL approach. This is also in agreement with the simulation results reported in Table 1. But as argued in Section 3, the large biases of the HL estimates along with their smaller standard errors, make these HL estimates unreliable, as they converge to wrong values far away from the parameter values. Remark that as expected, the PGQL and PHL estimates are found to be closer to the GQL and HL estimates, respectively. Since GQL estimates are found to be better that the HL and MM estimates, we now interpret the effects of the covariates as well as the variance of the family effects by using the GQL approach. Thus, ˆσ GQL 2 = indicates that the data contain large over-dispersion. This is also in agrement with the results reported in Subsection 4.1, where it was shown that an individual visits the physician 3.92 times on the average with a very large variance With regard to the regression effects, the negative value of ˆβ 1(GQL), namely ˆβ 1,(GQL) = , indicates that females made more visits to the physician as compared to males. Next, ˆβ 2(GQL) = and ˆβ 4,(GQL) = suggest that the individuals having some chronic diseases or individuals who are older pay more visits to the physician, as expected. The effect of the education level on the health condition, however, appears to be intriguing. This is because ˆβ 3(GQL) = suggests that highly educated individuals have more visits compared to individuals with a lower level of education. One of the reasons for this type of behavior of this covariate may be that individuals with a higher level of education are more concerned about their health condition compared to individuals with a lower level of education. 4.3.GQL estimates under a zero-inflated poisson model. Note that in the analysis of health care data displayed in Table 3, we have assumed that the Poisson distribution conditional on the random effects would fit the data well. Since, on a closer look, the count 0 in particular, appears to occur in an excessive rate than expected, we now accommodate this feature by applying the so-called Zero-inflated Poisson (ZIP) mixed model. Here we refer to Ridout et al. (2001) and the references therein for the analysis of ZIP fixed model.

21 Generalized quasi-likelihood vs. hierarchical likelihood 75 Let w i denote the proportion of zeros in the ith family (,..., 48). Then in the fashion similar to that of Ridout et al. (2001), we write ZIP density for y ij conditional on γ i, as f(y ij γ i ) = w i + (1 w i ) exp( µ ij ) y ij = 0, (1 w i ) exp( µ ij )(µ ij )2 /y ij! y ij > 0, (4.3) where µ ij = exp(x ij β + γ i). This conditional ZIP model produces the mean and the variance of y ij as E(Y ij γ i ) = (1 w i )µ ij, V ar(y ij γ i ) = (1 w i )µ ij (1 + w i µ ij ). By computing ŵ i = [number of zeros in ith family]/n i, and using this estimate we now compute the unconditional mean and the variance and other moments of order up to four. We then solve the GQL estimating equations (2.1) and (2.3) to obtain the estimates of β and σ 2 under the ZIP mixed model. These Zero-inflated Poisson GQL (ZIGQL) estimates along with their standard errors are reported in the same Table 4. Note that the ZIGQL estimate of the overdispersion parameter appears to be larger (1.397) than the GQL estimate (0.873). Thus, the count data considered here appear to have much more overdispersion than shown by the GQL approach. As far as the regression estimates are concerned, the ZIGQL estimates appear to be slightly different than the corresponding GQL estimates. The ZIGQL approach produces the regression estimates with larger standard errors than those of the GQL approach. This is expected, as the ZIGQL approach indicates that the data has more variation. Note that the ZIGQL estimates are not comparable with those of the HL estimates. To examine any possible changes to the HL estimates due to zero-inflation, one could compute the ZIHL estimates, but, it was not done in this paper as the HL estimates were found to be inferior to the GQL estimates. 5 Concluding Remarks In this paper we have considered a Poisson-normal mixed model which is an important special case of the well-known generalized linear mixed model.

22 76 M.R.I. Chowdhury and B.C. Sutradhar In this problem, it is of interest to estimate the regression effects and variance of the random effects, consistently and efficiently. A great deal of discussion has taken place over the last two decades on the relative performance of some of the widely used estimation methods such as MM (Jiang, 1998, Jiang and Zhang, 2001), PQL (Breslow and Clayton, 1995) and GQL (Sutradhar, 2004) approaches. But none of these procedures were compared with the existing HL (Lee and Nelder, 1996) approach, even though this later approach appears to be quite familiar. Since the GQL approach was found to be better than MM and PQL approaches, in this paper, we have examined the relative performance of this well behaved GQL approach with the HL approach. For the comparison between the HL and GQL approaches, we have first simplified all related estimating equations under these two approaches. We then conducted an extensive simulation study to examine the relative performances of these procedures in estimating both regression effects and variance of the random effects. Note that the HL approach requires 3 estimating equations including the estimation of the random effects, whereas the GQL approach requires only two estimating equations where it is not needed to estimate the random effects. The simulation study was conducted for five different cluster sizes and three values of σ 2 (variance of the random effects), small and large. It was found that as the value of σ 2 increases, the HL approach starts to produce highly biased estimates for the regression effects. The GQL approach was however found to be producing almost unbiased estimates for the regression effects, irrespective of the magnitude of σ 2. As far as the estimation of the variance parameter σ 2 is concerned, the GQL approach was also found to be uniformly better than the HL approach. In this case, in contrary to the regression estimation, the HL approach was found to perform better even though it trails to the GQL approach. Hence, the GQL approach is definitely better than the HL approach in estimating all parameters of the model. When other studies mentioned above are taken into consideration, the GQL approach appears to be the best so far among the MM, PQL and HL approaches. We therefore recommend the use of the GQL approach in practice irrespective of the magnitude of the over-dispersion in the familial/cluster count data. To demonstrate the effectiveness of the GQL approach, we have applied both GQL and HL approaches to analyze a real life data set on health care utilization. The estimates were found to reflect the simulation results. We have also applied

23 Generalized quasi-likelihood vs. hierarchical likelihood 77 a ZIGQL approach to the same data set and estimates were found to be similar but different than the GQL estimates. Acknowledgements. The authors would like to thank the referee and the associate editor for their valuable comments that lead to the improvement of the paper. This research was supported partially by a grant from the Natural Sciences and Engineering Research Council of Canada. References Breslow, N.E. and Clayton, D.G. (1993). Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc., 88, Breslow, N.E. and Lin, X. (1995). Bias correction in generalized linear models with single component of dispersion. Biometrika, 82, Jiang, J. (1998). Consistent estimators in generalized linear mixed models. J. Amer. Statist. Assoc., 93, Jiang, J. and Zhang, W. (2001). Robust estimation in generalized linear mixed models. Biometrika, 88, Jowaheer, V. (2006). Model misspecification effects in clustered count data analysis. Statist. Probab. Lett., 76, Kuk, A. C. (1995). Asymptotically unbiased estimation in generalized linear models with random effects. J. Roy. Statist. Soc. B, 57, Lee, Y. and Nelder, J.A. (1996). Hierarchical generalized linear models. J. Roy. Statist. Soc. B, 58, Lee, Y. and Nelder, J.A. (2001). Hierarchical generalized linear models: A synthesis of generalized linear models, random-effect models and structured dispersions. Biometrika, 88, Lin, X. and Breslow, N.E. (1996). Bias correction in generalized linear mixed models with multiple components of dispersion. J. Amer. Statist. Assoc., 91, McCullagh, P. (1983). Quasi-likelihood Functions. Ann. Statist., 11, McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models, 2nd edn. London: Chapman and Hall. Ridout, M., Hinde, J. and Demetrio, C.G.B. (2001). A score test for testing a Zero-inflated Poisson regression model against Zero-inflated negative binomial alternatives. Biometrics, 57, Sutradhar, B.C., Jowaheer, V. and Sneddon, G. (2008). On a unified generalized quasi-likelihood approach for familial-longitudinal non-stationary data. Scandinavian J. Statist., 35, Sutradhar, B.C. and Qu, Z. (1998). On approximation likelihood inference in Poisson mixed Model. Canad. J. Statist., 26, Sutradhar, B.C. and Rao, R.P. (2001). On marginal quasi-likelihood inference in generalized linear mixed models. J. Multivariate Anal., 76, Sutradhar, B.C. (2004). On exact quasi-likelihood inference in generalized linear mixed models. Ind. J. of Statist., 66,

24 78 M.R.I. Chowdhury and B.C. Sutradhar Wedderburn, R. (1974). Quasi-likelihood functions, generalized linear models, the Gauss-Newton method. Biometrika, 61, M.R.I. Chowdhury and B.C. Sutradhar Department of Mathematics and Statistics Memorial University of Newfoundland St. John s, NL, Canada A1C 5S7. Paper received December 2007; revised December 2008.

ATINER's Conference Paper Series STA

ATINER CONFERENCE PAPER SERIES No: LNG2014-1176 Athens Institute for Education and Research ATINER ATINER's Conference Paper Series STA2014-1255 Parametric versus Semi-parametric Mixed Models for Panel