Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data

Size: px
Start display at page:

Download "Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data"

Transcription

1 Sankhyā : The Indian Journal of Statistics 2009, Volume 71-B, Part 1, pp c 2009, Indian Statistical Institute Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data M.R.I. Chowdhury and B.C. Sutradhar Memorial University of Newfoundland, Canada Abstract Inferences for the regression parameters and the variance of the random effects in the generalized linear mixed models (GLMMs) set up, is an extremely important statistical issue. It is however known that the most widely used penalized quasi-likelihood (PQL) approach may not produce consistent estimates for the parameters, especially when the true variance of the random effects is large. In the context of Poisson mixed models, in this paper, we examine the consistency performances of two other competitive estimation approaches, namely, the hierarchical likelihood (HL) and the generalized quasi-likelihood (GQL) approaches. An extensive simulation study shows that the HL approach, similar to the PQL approach, appears to produce highly biased and hence inconsistent estimates for the regression parameters, especially when the variance of the random effects is large. The biases of the HL estimates also appears to vary depending on the cluster sizes. As an alternative, the GQL approach appears to produce consistent estimates for all parameters of this model irrespective of the size of the cluster and the magnitude of the variance of the random effects. The GQL and HL estimates are also compared in a real life data analysis. AMS (2000) subject classification. Primary 62F10; secondary 62H20. Keywords and phrases. GLMMs, method of moments approach, penalized quasi-likelihood approach, hierarchical likelihood approach, generalized quasilikelihood approach, relative bias, zero-inflated poisson mixed model. 1 Introduction In general, whether the responses are count or binary, the generalized linear mixed models (GLMMs) for such responses are generated from the well-known generalized linear model (GLM) (McCullagh and Nelder, 1989) by adding random effects to the linear predictor. A random variable Y ij

2 56 M.R.I. Chowdhury and B.C. Sutradhar for the j th member (j = 1,...,n i ) of the i th family (i = 1,...,K), with exponential density f (y ij ) = exp[{y ij η ij a(η ij )} + b(y ij )] (1.1) follows a GLM when η ij has a linear form, namely, η ij = x ij β where x ij = (x ij1,...,x iju,...,x ijp ) is the vector of covariates and β is the vector of regression parameters. In (1.1), a(.) and b(.) are known functions. Note that the exponential form (1.1) contains both the Poisson and binary distributions. As far as the GLMM is concerned, it is developed by adding random effects, say γ i to the linear predictor η ij, where γ i for i = 1,...,K are generally assumed to be independently and identically distributed (i.i.d) with mean 0 and variance σ 2 i.i.d. In notation, we use γ i (0,σ 2 ). Thus in the GLMM set-up, the binary and count responses follow (1.1) with a(η ij ) = exp (η ij ) in the Poisson case, and a(η ij ) = exp(η ij )/(1 + exp (η ij )) in the binary case, where η ij = x ij β + σ z i γi. Here, z i is a random effects related known covariate for the ith cluster, and γi = γ i (0,1). In this paper, σ to be specific, we will deal with the GLMMs for count data only. Note that the GLMMs for the count and binary data with two or more random effects have also been discussed in the literature (see Lin and Breslow (1996), for example) but in the present paper, without any loss of generality, we confined our analysis to the GLMMs (1.1) with a single random effect. One of the main reasons for the selection of such simple models in this paper is that our objective is not to analyze any higher dimensional complicated model, but, to compare the relative performances of two highly competitive estimation approaches in estimating the parameters of simpler models. Under the GLMMs setup, it has proven to be difficult to obtain consistent and efficient estimators for the regression parameters (β) and the variance of the random effects (σ 2 ). A full or exact likelihood analysis is complicated as it requires a complex integration over the distribution of the random effects. This integration problem compels one to avoid the exact likelihood estimation method, even though it is known that maximum likelihood estimators will be fully efficient (optimal). To overcome this computational problem, many authors, over the last two decades, have used various approximate methods to obtain consistent estimates for the regression effect β and the variance of the random effects σ 2. For example, we refer to some of the leading approaches such as the penalized quasi-likelihood (PQL) approach of Breslow and Clayton (1993), Breslow and Lin (1995), Kuk (1995), and i.i.d

3 Generalized quasi-likelihood vs. hierarchical likelihood 57 Lin and Breslow (1996), hierarchical likelihood (HL) approach due to Lee and Nelder (1996, 2001), and the recent generalized quasi-likelihood (GQL) approach of Sutradhar (2004), Jiang (1998), Jiang and Zhang (2001), and Sutradhar and Rao (2001). Note that in both PQL and HL approaches the estimation of β and σ 2 requires the estimation of γ i as well; which are estimated by pretending that γ i s are fixed parameters even though they are truly unobservable random effects. To be specific, in the first step, both PQL and HL approaches estimate β and γ i. The difference between the two approaches is that the PQL approach estimates β and γ i by maximizing a penalized quasi-likelihood function, whereas the HL approach maximizes the hierarchical likelihood function h = log K n i j=1 f(y ij γ i ) + log K h (γ i ;σ 2 ), (1.2) to estimate these parameters, where h (γ i ;σ 2 ) is the likelihood function of unobserved γ i s, and f(y ij γ i ) is the conditional density function of the response y ij given γ i. In the second step, in estimating σ 2, the PQL approach maximizes a profile quasi-likelihood function, whereas the HL approach maximizes an adjusted profile hierarchical likelihood function. Further we remark that the PQL approach of Breslow and Clayton (1993) may however not provide consistent estimates for the parameters, mainly for the variance component σ 2 [Sutradhar and Qu (1998)]. To be specific, Sutradhar and Qu (1998) have argued that the PQL approach may provide inconsistent estimate for σ 2 when cluster sizes (n i ) are small, specially under the situations where σ 2 is large such as σ Breslow and Lin (1995, p.20) also have reported similar consistency problem for the estimates of σ 2. It was in particular argued by these authors that the PQL approach produces consistent estimates for the cases when σ 2 is small such as σ Lin and Breslow (1996) attempted to produce bias corrected estimates, but their bias corrections also do not appear to improve the consistency results significantly when σ 2 is large. Consequently, we do not follow the PQL approach any further in the present paper. Note that as the PQL and the HL approaches estimate β and σ 2 through the estimation of γ i (i = 1,...,K), the HL approach may also suffer from

4 58 M.R.I. Chowdhury and B.C. Sutradhar similar inconsistency problems due to similar reasons that cause inconsistency in the PQL approach. There does not however exist any comparative study between the HL and any other competitive approaches. Further note that as an alternative to the PQL and other moments based (Jiang and Zhang (2001)) approaches, Sutradhar (2004) has recently proposed a GQL approach that produces consistent as well as more efficient estimates. The purpose of this paper is to make a comparative study between the GQL and the HL approaches. Both of these HL and GQL approaches are reviewed briefly in Section 2. An extensive simulation study is conducted in Section 3 for various small and large values of σ 2, as well as for small and large cluster sizes such as n i = 2, 4 and 6. The simulation results show that the HL approach estimates the parameters with larger biases as compared to the GQL approach, the performance of the HL approach is being relatively worse for the estimates of the main regression parameter β. Both of the GQL and HL estimation approaches are illustrated in Section 4 by analyzing a health care utilization data. The paper is concluded in Section 5. 2 Estimation of Parameters: GQL versus HL Inferences 2.1. GQL approach. Let y i = (y i1,...,y ij,...,y ini ) and µ i = E (Y ij ) = (µ i1,...,µ ij,...,µ ini ), where µ ij = E γi E(Y ij γ i ) = E γi (µ ij ) with µ ij = exp (x ij β + z iγ i ) as the conditional mean of the Poisson distribution computed from (1.1). Further let, Σ i = (σ ijk ) be the n i n i covariance matrix of y i. It can be shown that σ ijj = V ar (Y ij ) = µ ij + [exp(z 2 i σ 2 ) 1]µ 2 ij for j = 1,...,n i, and for j,k = 1,...,n i,j k, σ ijk = Cov (Y ij,y ik ) = µ ij µ ik [exp(z 2 i σ 2 ) 1]. For given σ 2, one may then write the GQL estimating equation [Sutradhar (2004)] for β as K µ i β Σ 1 i (y i µ i ) = 0, (2.1) (see Wedderburn (1979) and McCullagh (1983) for the independence case) where µ i β = X ia i, with X i = [x i1,...,x ini ] and A i = diag[µ i1,...,µ ini ].

5 Generalized quasi-likelihood vs. hierarchical likelihood 59 Note that as the first order responses are used to construct (2.1), we refer to this equation as the first order GQL estimating equation. Once the regression effects β is estimated by solving (2.1), one may construct a second order GQL estimating equation to estimate σ 2. Let g i = [y 2 i1,...,y 2 in i,y i1 y i2,...,y ini 1 y ini ]. (2.2) be the n i (n i + 1)/2 1 vector of all distinct second order responses for the ith cluster. Let λ i = E(G i ) and Ω i = Cov (G i ) be the mean vector and covariance matrix of g i respectively under the Poisson mixed model. It then follows that for a given value of β, the variance component σ 2 may be estimated by solving the second order GQL estimating equation K λ i σ 2 Ω 1 i (g i λ i ) = 0. (2.3) [Sutradhar (2004), Jiang and Zhang (2001)]. Note that the λ i vector in (2.3) is easily calculated. This is because, E(Yij 2) = V ar(y ij) + µ 2 ij for all j = 1,...,n i ; and E(Y ij Y ik ) = Cov(Y ij,y ik ) + µ ij µ ik, for all j k,j,k = 1,...,n i. The derivative vector λ i may also be easily computed. The σ2 computation for the fourth order moments matrix Ω i is, however, lengthy, but straightforward. All elements of this matrix may be computed by using the conditioning and unconditioning principle as it was done for the computation of the variances and covariances. For example, the variance of Yij 2 may be computed as V ar(yij 2 ) = E(Y ij 4 2 ) [E (Yij )]2 = E γi E(Yij 4 γ i) [E (Yij 2 )]2 = µ ij + µ 2 ij (7ez2 i σ2 1) + 2µ 3 ij ez2 i σ2 (3e 2z2 i σ2 1) +µ 4 ij e 2z2 i σ2 (e 4z2 i σ2 1), (2.4) and the covariance between Y 2 ij and Y iky il for j k < l may be computed as Cov(Y 2 ij,y iky il ) = E γi E (Y 2 ij Y ik Y il γ i ) E γi [E (Y 2 ij γ i)e (Y ik Y il γ i )] = µ ij µ ik µ il e z2 i σ2 (e 2z2 i σ2 1)+µ 2 ij µ ikµ il e 2z2 i σ2 (e 4z2 i σ2 1).(2.5)

6 60 M.R.I. Chowdhury and B.C. Sutradhar We remark that as E(Y i ) = µ i and E(G i ) = λ i, the first (2.1) and the second order (2.3) GQL estimating equations are naturally unbiased. This unbiasedness property yields consistent estimates for both β and σ 2. Furthermore, as the covariance matrices Σ i and Ω i are appropriate weight matrices in the estimating equations, the GQL estimates of β and σ 2 will also be more efficient than any other unbiased such as moment estimators derived without exploiting the weight matrices. See, for example, Jiang (1998), Jiang and Zhang (2001), and Sutradhar (2004). We further remark that the GQL estimation of β and σ 2 by (2.1) and (2.3) does not require any estimates for the random effects γ i (i = 1,2,...,K), whereas the existing leading analytical approaches such as the PQL approach of Breslow and Clayton (1993) and the HL approach of Lee and Nelder (1996) need to estimate these random effects by pretending that they are fixed effects. This is, however, known [Sutradhar and Qu (1998), Jiang (1998)] that the PQL approach for the estimation of β and σ 2 through the estimation of γ i appears to encounter convergence problems, especially for the estimation of σ 2. Similar to the PQL approach, as the HL approach exploits the estimates of γ i for the estimation of β and σ 2, the HL approach may also encounter similar inconsistency problems as that of the PQL approach. But, even though, the HL approach has been illustrated by analyzing various numerical examples in Lee and Nelder (1996, 2001), there does not appear any empirical study to examine the consistency property of the HL estimates for β and σ 2. In Section 3, we conduct an extensive simulation study to examine whether HL estimation satisfy this important property. The HL estimating equations are overviewed in the next subsection. Note that we also include the GQL approach in the simulation study in order to examine the relative performances of the HL estimators as compared to those of the GQL estimators. The GQL and HL approaches are also illustrated by analyzing a real life data. This is done in Section HL approach. In the present set up, that is, under the Poissonnormal mixed model, the HL function (1.2) may be expressed as h = N [ n i j=1 y ij log(µ ij ) K n i j=1 K 2 log(2π) + K 2 log(σ2 ) + 1 2σ 2 N µ ij n i log (y ij!) K γ 2 i ] j=1, (2.6)

7 Generalized quasi-likelihood vs. hierarchical likelihood 61 where µ ij = exp(x ij β + z iγ i ). In this approach, for given σ 2, similar to the PQL approach, one estimates β and γ i by the maximizing the likelihood function in (2.6). More specifically, by writing µ i = [µ i1,...,µ in i ], one solves the HL estimating equations h β = 0 and h γ i = 0, (2.7) for β and γ i (i = 1,...,K), respectively. For the estimation σ 2, Lee and Nelder (1996) have, however maximized a general adjusted profile h-likelihood defined by h A = h log {det(2π ϕh 1 )} = h log (2π) 1 2 log {det(h)}, (2.8) where ϕ=1 under the Poisson model, and H matrix is defined as [ X H = WX X WZ Z WX Z WZ + U ] (p+k) (p+k) where X = [X 1,...,X i,...,x K ] : n i p, with X i as the n i p covariate matrix; W and Z are block-diagonal matrices given by W = K A i : ni n i and Z = K 1 n i : ni K, respectively, with A i = diag[µ i1,...,µ in i ]: ni n i and 1 ni = (1,...,1) : ni 1 ; and U = 1 I σ 2 K, I K being the K K identity matrix. The maximization is achieved by using the iterative equation given by ˆσ 2 (r+1) = ˆσ2 (r) + [ ( 2 h A σ 4 ) 1 ] ha σ 2, (2.9) (r) where the square bracket [ ] (r) indicates that the quantity in [ ] is evaluated at σ 2 = ˆσ (r) 2, r being the rth iteration.

8 62 M.R.I. Chowdhury and B.C. Sutradhar Let ˆβ GQL and ˆσ GQL 2 be the solutions of the GQL estimating equations (2.1) for β and (2.3) for σ 2, respectively. Similarly, let ˆβ HL and ˆσ HL 2 be the solutions of the HL based estimating equations (2.7) for β, and (2.9) for σ 2, respectively. 3 A Simulation Study In this section, we examine the relative performances of the HL estimators, ˆβHL and ˆσ HL 2, to the corresponding GQL estimators, ˆβGQL and ˆσ GQL 2, through a simulation study consisting of 500 simulations. Under each simulation, for given covariates and selected values of the design parameters, we generate the data {y ij ; j = 1,...,n i, i = 1,...,K} by using the Poisson form of (1.1) with mean parameters µ ij = E (Y ij γ i ) = exp(β 1 x ij1 + β 2 x ij2 + σ γ i ) (3.1) where γi = γ i iid N(0,1). We choose the cluster number as K = 100 and σ cluster sizes n i = 2, 4, 6, 10 and 16. Note that the cluster sizes are chosen to accommodate both small and large clusters. For regression parameters we use β 1 = β 2 = 1.0. As far as the values of the fixed covariates x ij = (x ij1, x ij2 ) are concerned, we choose x ij1 = 1 for j = 1,2,...,n i /2; i = 1,2,...,K/2 0 for j = n i /2 + 1,...,n i ; i = 1,2,...,K/2 1 for j = 1,...,n i ; i = K/2 + 1,...,K 1 for j = 1,2,...,n i /2; i = 1,2,...,K/2 2 for j = n i /2 + 1,...,n i ; i = 1,2,...,K/2 x ij2 = 0 for j = 1,2,...,n i /2; i = K/2 + 1,...,K 1 for j = n i /2 + 1,...,n i ; i = K/2 + 1,...,K

9 Generalized quasi-likelihood vs. hierarchical likelihood 63 With regard to the selection of the variance of the random effects, we choose σ 2 = 0.4, 0.8, and 1.2. We remark that even though in theory the overdispersion index parameter σ 2 can take any value from 0 to, for practical purposes σ appears to be quite large. This is because under the Poisson-normal mixed model (3.1), the overdispersion in the count data may increase significantly even if the increment in σ 2 is small. To be specific, the variance of y ij, σ ijj = µ ij + [exp(σ 2 ) 1]µ 2 ij, under the Poisson-normal mixed model increases significantly, depending on the value of the mean function µ ij = exp(x ij β σ2 ), even if σ 2 changes from 1.0 to 1.2, for example. We further remark that Breslow and Lin (1995, P. 90) were able to obtain unbiased estimate of this overdispersion index parameter σ 2 when σ 2 ranges up to 0.5 only. Next, under each simulation, the simulated values of {y ij } along with the values of the covariates {x ij } are used to obtain the GQL estimates of β and σ 2 by solving (2.1) and (2.3), respectively, and the HL estimates of β and σ 2 by using (2.7) and (2.9), respectively. Note that under the HL approach, we also had to estimate γ i (i = 1,...,100) by treating them as the fixed parameters, but these estimates will not be reported as they are not of direct interest. The simulated means (SMs), simulated standard errors (SSEs) for each of the above four estimates are shown in Table 1 for all selected cluster sizes n i = 2, 4, 6, 10 and 16.

10 64 M.R.I. Chowdhury and B.C. Sutradhar Table 1. Comparison of simulated means (SMs), standard errors (SSEs), and relative bias (SRB) of the regression estimates and estimates of variance of random effects by the GQL and HL approaches, under the Poisson-normal mixed model, for selected values of σ 2 : K = 100; n i=2, 4, 6, 10 and 16 (i = 1,2,..., K); true values of the regression parameters: β 1 = 1.0 and β 2 = 1.0 Cluster Estimates size σ 2 Method Quantity ˆβ1 ˆβ2 ˆσ GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB

11 Generalized quasi-likelihood vs. hierarchical likelihood 65 Table 1. (Continued) Cluster Estimates size σ 2 Method Quantity ˆβ1 ˆβ2 ˆσ GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB

12 66 M.R.I. Chowdhury and B.C. Sutradhar Table 1. (Continued) Cluster Estimates size σ 2 Method Quantity ˆβ1 ˆβ2 ˆσ GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB GQL SM SSE SRB HL SM SSE SRB

13 Generalized quasi-likelihood vs. hierarchical likelihood 67 Table 2. Comparison of simulated means (SMs), standard errors (SSEs), and relative bias (SRB) of the regression estimates and estimates of variance of random effects by the GQL approach, under the Poisson-gamma mixed model, for selected values of σ 2 : K = 100; n i=4 and 10 (i = 1,2,..., K); true values of the regression parameters: β 1 = 1.0 and β 2 = 1.0 Cluster Estimates size σ 2 Quantity ˆβ1 ˆβ2 ˆσ SM SSE SRB SM SSE SRB SM SSE SRB SM SSE SRB SM SSE SRB SM SSE SRB Note that when an estimate becomes highly biased with small standard error, it turns to be an useless estimate. For this reason, to check the actual convergence of the estimates to their corresponding parameter values, we have also computed the simulated relative bias (SRB) given by SRB = SM True parameter value SSE 100. These SRBs are also reported in the same Table for all cluster sizes and selected values of the overdispersion index parameter Relative performance of GQL and HL approaches. With regard to the estimation of β 1 and β 2, it is clear from the simulation results displayed in Table 1 that the GQL approach always produces the regression estimates with smaller relative bias as compared to the HL approach. This better performance of the GQL approach appears to hold for all small and large cluster sizes n i = 2, 4, 6, 10 and 16; as well as for all small and large values

14 68 M.R.I. Chowdhury and B.C. Sutradhar of σ 2 = 0.4, 0.8 and 1.2. For example, when n i = 2, the GQL estimates of β 1 and β 2 are slightly biased with SRBs 33 and 52 when σ 2 = 0.4, and respective SRBs are 23 and 31 when σ 2 = 1.2. But, the HL estimates for the same regression parameters appear to converge to wrong values with small standard errors. To be specific, for n i = 2, the HL estimates of β 1 and β 2 appear to have SRBs 195 and 253 when σ 2 = 0.4, and 921 and 764 when σ 2 = 1.2. Similarly, for a large cluster size n i = 16, the SRBs for the GQL estimates are found to be 10 and 12 when σ 2 = 0.4, and 11 and 11 when σ 2 = 1.2; whereas the respective SRBs for the HL approach are found to be 699 and 994 when σ 2 = 0.4, and 326 and 306 when σ 2 = 1.2. Thus, irrespective of the cluster size n i and the value of σ 2, the GQL approach performs much better than the HL approach in estimating β 1 and β 2. Note that the results in Table 1 also reveal that the pattern of the regression estimates as a function of n i and σ 2 is different under the HL approach as compared to the GQL approach. When cluster size is small such as, n i = 2, 4 and 6, the SRBs of the regression estimates get smaller for the GQL estimates but they get larger for the HL estimates, as the value of σ 2 increases. When cluster size is large such as n i = 10 and 16, the performance of the GQL estimates of the regression parameters remains the same irrespective of the value of σ 2, whereas the HL estimates appear to perform better as the value of σ 2 increases. But, when HL and GQL approaches are compared, as mentioned above, the GQL approach performs uniformly better in estimating β 1 and β 2. With regard to the estimation of the overdispersion parameter σ 2, the GQL approach, in general, appears to perform better than the HL approach. For example, when n i = 6, the GQL approach produces σ 2 estimates with SRBs 14 and 23 for σ 2 = 0.4 and 1.2, respectively; whereas the corresponding SRBs for the HL estimates are found to be 65 and 196, respectively. 3.2.Performance of the GQL approach under negative binomial set-up. In the simulation study in the last section, it was found that the GQL approach performed uniformly better than the HL approach in estimating both regression and overdispersion parameter. But, the comparison was made under the Poisson-normal mixed model discussed in Section 2. In this section, we conducted another simulation study by generating the random effects γ i from a gamma distribution with parameters 1/(c exp(σ 2 /2)) and 1/c with c = exp(σ 2 ) 1. To be specific, we generate γ i from the gamma distribution with density

15 Generalized quasi-likelihood vs. hierarchical likelihood 69 f(γ i ) = [{c exp(σ 2 /2)} 1/c Γc 1 ] 1 exp[ {c exp(σ 2 /2)} 1 γ i ]γ c 1 1 i, (3.2) (Jowaheer, 2006) and then generate y ij from the Poisson distribution (Poisson form of (1.1)) with mean µ ij = exp (x ij β + γ i). It then follows that y ij unconditionally has a negative binomial distribution with the same mean and the same variance as in the Poisson-normal model considered in Section 3.1. But, the higher order moments are different under these two models. In the present simulation study we use the same covariates and the same values for the design parameters as in the previous simulation study in Section 3.1. For simplicity, we have reported the simulation results in Table 2 for a small cluster size n i = 4 and for a relatively large cluster size n i = 10. All three values of σ 2 were considered. It is clear from the results in Table 2 that the GQL approach continue to performs well in estimating the parameters even under the present negative binomial model. 4 A Numerical Illustration: Health Care Utilization Data Analysis In the last section it was shown through a simulation study that the HL approach produces highly biased estimates with smaller standard errors as compared to the GQL approach. Thus the GQL approach performs better than the HL approach in estimating all parameters of the Poisson familial model. The purpose of this section is to analyze a real life data and interpret the estimates reflecting the simulation behavior. With regard to the selection of the data, we have chosen to analyze a set of count data collected by a general hospital in Canada that contains responses and associated covariates from a moderately large number of independent families. The data set is described in subsection Health care utilization data. The Department of Community Medicine, Health Science Center (General Hospital) in St. John s, Canada, collected a familial-longitudinal data on health care utilization for K = 48 families over a period of 6 years for 1985 to Note that some authors such as Sutradhar, Jowaheer and Sneddon (2008) have used the GQL approach and analyzed this complete familial-longitudinal data for four years collected from 48 families. Our purpose is, however, to concentrate on the familial data analysis only as an application of the theoretical and simulation results discussed in Sections 2 and 3.

16 70 M.R.I. Chowdhury and B.C. Sutradhar For our purpose, we consider a part of this whole data set, for 1985 only. Among these 48 families, 36 are of size n i = 4 (i = 1,...,36), and the remaining 12 are of size n i = 3 (i = 37,...,48). Each of the family members was asked about the number of visits they paid to a physician during In our notation, y ij denote this count response for the jth (j =1,...,n i ) of the ith family (i = 1,...,48). Their gender (x ij1 ), the chronic condition (x ij2 ) [CC], education level (x ij3 ) [EL] and age of the individual (x ij4 ), were also collected. For convenience, we use the following codes for these four covariates: x ij1 = { 0 female 1 male x ij2 = { 0 without chronic diseases 1 with chronic diseases x ij3 = { 0 less than high school 1 high school or above x ij4 = exact age of the individual To have a feel for this familial data, the distribution of the count responses, from 180 individuals, by all four covariates is displayed in Table 3. Table 3. Summary Statistics of Physician Visits by Four Covariates in the Health Care Utilization Data for Number of Visits Covariates Level Total Gender Male Female Chronic Condition No Yes Education Level < High School High School Age It is seen from Table 3 that, in general, more males appear to visit a physician a smaller number of times, while a large number of females visit a physician at least 3 times. As expected, we see that an individual with

17 Generalized quasi-likelihood vs. hierarchical likelihood 71 chronic diseases visits a physician more often. Physician visits for individuals with a higher level of education seems to be evenly distributed, i.e. individuals are just as likely to visit a physician once as 3-5 times. For those with lower level of eduction, they appear to either not visit a physician, or visit a large number of times. With regard to the relationship between number of visits and age, we have temporarily made 5 age groups and observed that some of the individuals in the age group have visited a physician a large number of times. As expected, a large number of individuals did not visit a physician at all. For older age groups, there was a tendency for an individual to see a physician more often. In the next section, we examine the effects of the above four covariates on the responses by applying both GQL and HL approaches discussed in Section 2. In our notation, β = (β 1,β 2,β 3,β 4 ) denote their effects. Note however that when we computed the mean and the variance of the 180 count responses, it was found that an individual on the average has visited his/her physician 3.92 times with variance This indicate that the count responses contain over-dispersion which in our notation is represented by the over-dispersion index parameter σ 2, the variance of the random family effects. We also need to estimate this variance parameter along with the estimate of β GQL and HL estimates. The GQL estimates of β and σ 2 were obtained by solving the equation (2.1) and (2.3) respectively. The standard errors of the estimates of the components of β were calculated from the asymptotic covariance matrix of ˆβ GQL, given by Cov(ˆβ GQL ) = [ K µ i µ i β Σ 1 i β ] 1, and similarly, the standard error of ˆσ 2 GQL was computed from V ar(ˆσ 2 GQL) = [ K λ i λ i σ 2 Ω 1 i σ 2 ] 1. Next, the HL estimates of β and σ 2 were obtained from (2.7) and (2.9), respectively. The standard errors of the components of β were computed from

18 72 M.R.I. Chowdhury and B.C. Sutradhar Cov(ˆβ HL ) = [ K ] 1 X i diag [µ i1,...,µ in i ]X i, and similarly the standard error of ˆσ HL 2 was computed by using the so-called Hessian scalar [ ] 1 V ar(ˆσ HL 2 ) = 2 h A σ 4. These estimates along with their standard errors are displayed in Table 4. Table 4. The GQL, PGQL, ZIGQL, HL, PHL and MM Estimates along with available standard errors, for the Health Care Utilization Data for Effects of the Covariates Variance Method Quantity Gender(ˆβ 1) CC(ˆβ 2) EL(ˆβ 3) Age(ˆβ 4) ˆσ 2 GQL Value SE PGQL Value SE ZIGQL Value SE HL Value SE PHL Value SE MM Value SE Note that even though the method of moments [Jiang (1998)] is known to produce inferior estimates than the GQL approach [Sutradhar (2004)], for the sake of completeness, in this section, we also provide the moment estimates for both β and σ 2. To be specific, the moment estimate (ˆβ MM ) of β is obtained by solving K X i (y i µ i ) = 0, (4.1)

19 Generalized quasi-likelihood vs. hierarchical likelihood 73 with its asymptotic variance computed from Cov(ˆβ MM ) = [ K ] 1 X i diag [µ i1,...,µ ini ]X i, where µ i in (4.1) is the unconditional mean vector. As far as the moment estimate of σ 2 is concerned, we obtain ˆσ MM 2 by solving where [S E(S)] = 0 (4.2) S = N n i j=1 y 2 ij + N n i j<k y ij y ik, and E (S) = N n i [µ ij + µ 2 ij eσ2 ] + j=1 N n i j<k µ ij µ ik e σ2, with µ ij = exp (x ij β σ2 ). The MM estimate of β along with the standard errors of the components of ˆβ MM are shown in the same Table 4. The value of ˆσ MM 2 is also given. We also provide the estimates for β and σ 2 based on a partial GQL (PGQL) as well as partial HL (PHL) approach. To be specific in the PGQL approach, β and σ 2 are iteratively estimated by using their formulas computed based on the GQL and the MM approaches, respectively. Similarly, the PHL estimates of β and σ 2 are iteratively estimated by using their formulas computed based on the HL and the MM approaches, respectively. These estimates are also reported in Table 4. The results of the Table 4 show that the estimates for β and σ 2 are generally different under the GQL and the HL approaches. Furthermore, except for the estimate of σ 2, the HL estimates appear to be closer to the MM estimates than the GQL estimates. Since the GQL estimates are known

20 74 M.R.I. Chowdhury and B.C. Sutradhar to be better than the MM estimates [Sutradhar (2004)], and also because the GQL estimates reported based on the simulation study in Section 3, were found to be less biased than the HL estimates, one would naturally find the GQL estimates in Table 4 more reliable than the HL estimates. Note that the estimated standard errors under the HL approach are found to be smaller than the GQL approach. This is also in agreement with the simulation results reported in Table 1. But as argued in Section 3, the large biases of the HL estimates along with their smaller standard errors, make these HL estimates unreliable, as they converge to wrong values far away from the parameter values. Remark that as expected, the PGQL and PHL estimates are found to be closer to the GQL and HL estimates, respectively. Since GQL estimates are found to be better that the HL and MM estimates, we now interpret the effects of the covariates as well as the variance of the family effects by using the GQL approach. Thus, ˆσ GQL 2 = indicates that the data contain large over-dispersion. This is also in agrement with the results reported in Subsection 4.1, where it was shown that an individual visits the physician 3.92 times on the average with a very large variance With regard to the regression effects, the negative value of ˆβ 1(GQL), namely ˆβ 1,(GQL) = , indicates that females made more visits to the physician as compared to males. Next, ˆβ 2(GQL) = and ˆβ 4,(GQL) = suggest that the individuals having some chronic diseases or individuals who are older pay more visits to the physician, as expected. The effect of the education level on the health condition, however, appears to be intriguing. This is because ˆβ 3(GQL) = suggests that highly educated individuals have more visits compared to individuals with a lower level of education. One of the reasons for this type of behavior of this covariate may be that individuals with a higher level of education are more concerned about their health condition compared to individuals with a lower level of education. 4.3.GQL estimates under a zero-inflated poisson model. Note that in the analysis of health care data displayed in Table 3, we have assumed that the Poisson distribution conditional on the random effects would fit the data well. Since, on a closer look, the count 0 in particular, appears to occur in an excessive rate than expected, we now accommodate this feature by applying the so-called Zero-inflated Poisson (ZIP) mixed model. Here we refer to Ridout et al. (2001) and the references therein for the analysis of ZIP fixed model.

21 Generalized quasi-likelihood vs. hierarchical likelihood 75 Let w i denote the proportion of zeros in the ith family (,..., 48). Then in the fashion similar to that of Ridout et al. (2001), we write ZIP density for y ij conditional on γ i, as f(y ij γ i ) = w i + (1 w i ) exp( µ ij ) y ij = 0, (1 w i ) exp( µ ij )(µ ij )2 /y ij! y ij > 0, (4.3) where µ ij = exp(x ij β + γ i). This conditional ZIP model produces the mean and the variance of y ij as E(Y ij γ i ) = (1 w i )µ ij, V ar(y ij γ i ) = (1 w i )µ ij (1 + w i µ ij ). By computing ŵ i = [number of zeros in ith family]/n i, and using this estimate we now compute the unconditional mean and the variance and other moments of order up to four. We then solve the GQL estimating equations (2.1) and (2.3) to obtain the estimates of β and σ 2 under the ZIP mixed model. These Zero-inflated Poisson GQL (ZIGQL) estimates along with their standard errors are reported in the same Table 4. Note that the ZIGQL estimate of the overdispersion parameter appears to be larger (1.397) than the GQL estimate (0.873). Thus, the count data considered here appear to have much more overdispersion than shown by the GQL approach. As far as the regression estimates are concerned, the ZIGQL estimates appear to be slightly different than the corresponding GQL estimates. The ZIGQL approach produces the regression estimates with larger standard errors than those of the GQL approach. This is expected, as the ZIGQL approach indicates that the data has more variation. Note that the ZIGQL estimates are not comparable with those of the HL estimates. To examine any possible changes to the HL estimates due to zero-inflation, one could compute the ZIHL estimates, but, it was not done in this paper as the HL estimates were found to be inferior to the GQL estimates. 5 Concluding Remarks In this paper we have considered a Poisson-normal mixed model which is an important special case of the well-known generalized linear mixed model.

22 76 M.R.I. Chowdhury and B.C. Sutradhar In this problem, it is of interest to estimate the regression effects and variance of the random effects, consistently and efficiently. A great deal of discussion has taken place over the last two decades on the relative performance of some of the widely used estimation methods such as MM (Jiang, 1998, Jiang and Zhang, 2001), PQL (Breslow and Clayton, 1995) and GQL (Sutradhar, 2004) approaches. But none of these procedures were compared with the existing HL (Lee and Nelder, 1996) approach, even though this later approach appears to be quite familiar. Since the GQL approach was found to be better than MM and PQL approaches, in this paper, we have examined the relative performance of this well behaved GQL approach with the HL approach. For the comparison between the HL and GQL approaches, we have first simplified all related estimating equations under these two approaches. We then conducted an extensive simulation study to examine the relative performances of these procedures in estimating both regression effects and variance of the random effects. Note that the HL approach requires 3 estimating equations including the estimation of the random effects, whereas the GQL approach requires only two estimating equations where it is not needed to estimate the random effects. The simulation study was conducted for five different cluster sizes and three values of σ 2 (variance of the random effects), small and large. It was found that as the value of σ 2 increases, the HL approach starts to produce highly biased estimates for the regression effects. The GQL approach was however found to be producing almost unbiased estimates for the regression effects, irrespective of the magnitude of σ 2. As far as the estimation of the variance parameter σ 2 is concerned, the GQL approach was also found to be uniformly better than the HL approach. In this case, in contrary to the regression estimation, the HL approach was found to perform better even though it trails to the GQL approach. Hence, the GQL approach is definitely better than the HL approach in estimating all parameters of the model. When other studies mentioned above are taken into consideration, the GQL approach appears to be the best so far among the MM, PQL and HL approaches. We therefore recommend the use of the GQL approach in practice irrespective of the magnitude of the over-dispersion in the familial/cluster count data. To demonstrate the effectiveness of the GQL approach, we have applied both GQL and HL approaches to analyze a real life data set on health care utilization. The estimates were found to reflect the simulation results. We have also applied

23 Generalized quasi-likelihood vs. hierarchical likelihood 77 a ZIGQL approach to the same data set and estimates were found to be similar but different than the GQL estimates. Acknowledgements. The authors would like to thank the referee and the associate editor for their valuable comments that lead to the improvement of the paper. This research was supported partially by a grant from the Natural Sciences and Engineering Research Council of Canada. References Breslow, N.E. and Clayton, D.G. (1993). Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc., 88, Breslow, N.E. and Lin, X. (1995). Bias correction in generalized linear models with single component of dispersion. Biometrika, 82, Jiang, J. (1998). Consistent estimators in generalized linear mixed models. J. Amer. Statist. Assoc., 93, Jiang, J. and Zhang, W. (2001). Robust estimation in generalized linear mixed models. Biometrika, 88, Jowaheer, V. (2006). Model misspecification effects in clustered count data analysis. Statist. Probab. Lett., 76, Kuk, A. C. (1995). Asymptotically unbiased estimation in generalized linear models with random effects. J. Roy. Statist. Soc. B, 57, Lee, Y. and Nelder, J.A. (1996). Hierarchical generalized linear models. J. Roy. Statist. Soc. B, 58, Lee, Y. and Nelder, J.A. (2001). Hierarchical generalized linear models: A synthesis of generalized linear models, random-effect models and structured dispersions. Biometrika, 88, Lin, X. and Breslow, N.E. (1996). Bias correction in generalized linear mixed models with multiple components of dispersion. J. Amer. Statist. Assoc., 91, McCullagh, P. (1983). Quasi-likelihood Functions. Ann. Statist., 11, McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models, 2nd edn. London: Chapman and Hall. Ridout, M., Hinde, J. and Demetrio, C.G.B. (2001). A score test for testing a Zero-inflated Poisson regression model against Zero-inflated negative binomial alternatives. Biometrics, 57, Sutradhar, B.C., Jowaheer, V. and Sneddon, G. (2008). On a unified generalized quasi-likelihood approach for familial-longitudinal non-stationary data. Scandinavian J. Statist., 35, Sutradhar, B.C. and Qu, Z. (1998). On approximation likelihood inference in Poisson mixed Model. Canad. J. Statist., 26, Sutradhar, B.C. and Rao, R.P. (2001). On marginal quasi-likelihood inference in generalized linear mixed models. J. Multivariate Anal., 76, Sutradhar, B.C. (2004). On exact quasi-likelihood inference in generalized linear mixed models. Ind. J. of Statist., 66,

24 78 M.R.I. Chowdhury and B.C. Sutradhar Wedderburn, R. (1974). Quasi-likelihood functions, generalized linear models, the Gauss-Newton method. Biometrika, 61, M.R.I. Chowdhury and B.C. Sutradhar Department of Mathematics and Statistics Memorial University of Newfoundland St. John s, NL, Canada A1C 5S7. Paper received December 2007; revised December 2008.

ATINER's Conference Paper Series STA

ATINER's Conference Paper Series STA ATINER CONFERENCE PAPER SERIES No: LNG2014-1176 Athens Institute for Education and Research ATINER ATINER's Conference Paper Series STA2014-1255 Parametric versus Semi-parametric Mixed Models for Panel

More information

Generalized Quasi-likelihood (GQL) Inference* by Brajendra C. Sutradhar Memorial University address:

Generalized Quasi-likelihood (GQL) Inference* by Brajendra C. Sutradhar Memorial University  address: Generalized Quasi-likelihood (GQL) Inference* by Brajendra C. Sutradhar Memorial University Email address: bsutradh@mun.ca QL Estimation for Independent Data. For i = 1,...,K, let Y i denote the response

More information

Bias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates

Bias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates Bias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates by c Ernest Dankwa A thesis submitted to the School of Graduate

More information

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004 Estimation in Generalized Linear Models with Heterogeneous Random Effects Woncheol Jang Johan Lim May 19, 2004 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association

More information

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori On Properties of QIC in Generalized Estimating Equations Shinpei Imori Graduate School of Engineering Science, Osaka University 1-3 Machikaneyama-cho, Toyonaka, Osaka 560-8531, Japan E-mail: imori.stat@gmail.com

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

QUANTIFYING PQL BIAS IN ESTIMATING CLUSTER-LEVEL COVARIATE EFFECTS IN GENERALIZED LINEAR MIXED MODELS FOR GROUP-RANDOMIZED TRIALS

QUANTIFYING PQL BIAS IN ESTIMATING CLUSTER-LEVEL COVARIATE EFFECTS IN GENERALIZED LINEAR MIXED MODELS FOR GROUP-RANDOMIZED TRIALS Statistica Sinica 15(05), 1015-1032 QUANTIFYING PQL BIAS IN ESTIMATING CLUSTER-LEVEL COVARIATE EFFECTS IN GENERALIZED LINEAR MIXED MODELS FOR GROUP-RANDOMIZED TRIALS Scarlett L. Bellamy 1, Yi Li 2, Xihong

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL Intesar N. El-Saeiti Department of Statistics, Faculty of Science, University of Bengahzi-Libya. entesar.el-saeiti@uob.edu.ly

More information

Sample size calculations for logistic and Poisson regression models

Sample size calculations for logistic and Poisson regression models Biometrika (2), 88, 4, pp. 93 99 2 Biometrika Trust Printed in Great Britain Sample size calculations for logistic and Poisson regression models BY GWOWEN SHIEH Department of Management Science, National

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Conditional Inference Functions for Mixed-Effects Models with Unspecified Random-Effects Distribution

Conditional Inference Functions for Mixed-Effects Models with Unspecified Random-Effects Distribution Conditional Inference Functions for Mixed-Effects Models with Unspecified Random-Effects Distribution Peng WANG, Guei-feng TSAI and Annie QU 1 Abstract In longitudinal studies, mixed-effects models are

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Analysis of 2 n Factorial Experiments with Exponentially Distributed Response Variable

Analysis of 2 n Factorial Experiments with Exponentially Distributed Response Variable Applied Mathematical Sciences, Vol. 5, 2011, no. 10, 459-476 Analysis of 2 n Factorial Experiments with Exponentially Distributed Response Variable S. C. Patil (Birajdar) Department of Statistics, Padmashree

More information

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

MODEL SELECTION BASED ON QUASI-LIKELIHOOD WITH APPLICATION TO OVERDISPERSED DATA

MODEL SELECTION BASED ON QUASI-LIKELIHOOD WITH APPLICATION TO OVERDISPERSED DATA J. Jpn. Soc. Comp. Statist., 26(2013), 53 69 DOI:10.5183/jjscs.1212002 204 MODEL SELECTION BASED ON QUASI-LIKELIHOOD WITH APPLICATION TO OVERDISPERSED DATA Yiping Tang ABSTRACT Overdispersion is a common

More information

STAT 526 Advanced Statistical Methodology

STAT 526 Advanced Statistical Methodology STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 10 Analyzing Clustered/Repeated Categorical Data 0-0 Outline Clustered/Repeated Categorical Data Generalized Linear Mixed Models Generalized

More information

Quasi-likelihood Scan Statistics for Detection of

Quasi-likelihood Scan Statistics for Detection of for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for

More information

R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models

R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models Christina Knudson, Ph.D. University of St. Thomas user!2017 Reviewing the Linear Model The usual linear model assumptions:

More information

An R # Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM

An R # Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM An R Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM Lloyd J. Edwards, Ph.D. UNC-CH Department of Biostatistics email: Lloyd_Edwards@unc.edu Presented to the Department

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

Approximate Likelihoods

Approximate Likelihoods Approximate Likelihoods Nancy Reid July 28, 2015 Why likelihood? makes probability modelling central l(θ; y) = log f (y; θ) emphasizes the inverse problem of reasoning y θ converts a prior probability

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

arxiv: v2 [stat.me] 8 Jun 2016

arxiv: v2 [stat.me] 8 Jun 2016 Orthogonality of the Mean and Error Distribution in Generalized Linear Models 1 BY ALAN HUANG 2 and PAUL J. RATHOUZ 3 University of Technology Sydney and University of Wisconsin Madison 4th August, 2013

More information

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use Modeling Longitudinal Count Data with Excess Zeros and : Application to Drug Use University of Northern Colorado November 17, 2014 Presentation Outline I and Data Issues II Correlated Count Regression

More information

High-dimensional two-sample tests under strongly spiked eigenvalue models

High-dimensional two-sample tests under strongly spiked eigenvalue models 1 High-dimensional two-sample tests under strongly spiked eigenvalue models Makoto Aoshima and Kazuyoshi Yata University of Tsukuba Abstract: We consider a new two-sample test for high-dimensional data

More information

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University A SURVEY OF VARIANCE COMPONENTS ESTIMATION FROM BINARY DATA by Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University BU-1211-M May 1993 ABSTRACT The basic problem of variance components

More information

if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1

if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1 Count models A classical, theoretical argument for the Poisson distribution is the approximation Binom(n, p) Pois(λ) for large n and small p and λ = np. This can be extended considerably to n approx Z

More information

MODELING COUNT DATA Joseph M. Hilbe

MODELING COUNT DATA Joseph M. Hilbe MODELING COUNT DATA Joseph M. Hilbe Arizona State University Count models are a subset of discrete response regression models. Count data are distributed as non-negative integers, are intrinsically heteroskedastic,

More information

Fisher information for generalised linear mixed models

Fisher information for generalised linear mixed models Journal of Multivariate Analysis 98 2007 1412 1416 www.elsevier.com/locate/jmva Fisher information for generalised linear mixed models M.P. Wand Department of Statistics, School of Mathematics and Statistics,

More information

The LmB Conferences on Multivariate Count Analysis

The LmB Conferences on Multivariate Count Analysis The LmB Conferences on Multivariate Count Analysis Title: On Poisson-exponential-Tweedie regression models for ultra-overdispersed count data Rahma ABID, C.C. Kokonendji & A. Masmoudi Email Address: rahma.abid.ch@gmail.com

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III) Title: Spatial Statistics for Point Processes and Lattice Data (Part III) Lattice Data Tonglin Zhang Outline Description Research Problems Global Clustering and Local Clusters Permutation Test Spatial

More information

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data Quality & Quantity 34: 323 330, 2000. 2000 Kluwer Academic Publishers. Printed in the Netherlands. 323 Note Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

A weighted simulation-based estimator for incomplete longitudinal data models

A weighted simulation-based estimator for incomplete longitudinal data models To appear in Statistics and Probability Letters, 113 (2016), 16-22. doi 10.1016/j.spl.2016.02.004 A weighted simulation-based estimator for incomplete longitudinal data models Daniel H. Li 1 and Liqun

More information

Reparametrization of COM-Poisson Regression Models with Applications in the Analysis of Experimental Count Data

Reparametrization of COM-Poisson Regression Models with Applications in the Analysis of Experimental Count Data Reparametrization of COM-Poisson Regression Models with Applications in the Analysis of Experimental Count Data Eduardo Elias Ribeiro Junior 1 2 Walmes Marques Zeviani 1 Wagner Hugo Bonat 1 Clarice Garcia

More information

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model Applied and Computational Mathematics 2014; 3(5): 268-272 Published online November 10, 2014 (http://www.sciencepublishinggroup.com/j/acm) doi: 10.11648/j.acm.20140305.22 ISSN: 2328-5605 (Print); ISSN:

More information

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington

More information

Negative Multinomial Model and Cancer. Incidence

Negative Multinomial Model and Cancer. Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence S. Lahiri & Sunil K. Dhar Department of Mathematical Sciences, CAMS New Jersey Institute of Technology, Newar,

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Sample Size and Power Considerations for Longitudinal Studies

Sample Size and Power Considerations for Longitudinal Studies Sample Size and Power Considerations for Longitudinal Studies Outline Quantities required to determine the sample size in longitudinal studies Review of type I error, type II error, and power For continuous

More information

Robust Bayesian Variable Selection for Modeling Mean Medical Costs

Robust Bayesian Variable Selection for Modeling Mean Medical Costs Robust Bayesian Variable Selection for Modeling Mean Medical Costs Grace Yoon 1,, Wenxin Jiang 2, Lei Liu 3 and Ya-Chen T. Shih 4 1 Department of Statistics, Texas A&M University 2 Department of Statistics,

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,

More information

CONDITIONAL LIKELIHOOD INFERENCE IN GENERALIZED LINEAR MIXED MODELS

CONDITIONAL LIKELIHOOD INFERENCE IN GENERALIZED LINEAR MIXED MODELS Statistica Sinica 14(2004), 349-360 CONDITIONAL LIKELIHOOD INFERENCE IN GENERALIZED LINEAR MIXED MODELS N. Sartori and T. A. Severini University of Padova and Northwestern University Abstract: Consider

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

Nonlinear multilevel models, with an application to discrete response data

Nonlinear multilevel models, with an application to discrete response data Biometrika (1991), 78, 1, pp. 45-51 Printed in Great Britain Nonlinear multilevel models, with an application to discrete response data BY HARVEY GOLDSTEIN Department of Mathematics, Statistics and Computing,

More information

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Testing Some Covariance Structures under a Growth Curve Model in High Dimension Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping

More information

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate

More information

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics Linear, Generalized Linear, and Mixed-Effects Models in R John Fox McMaster University ICPSR 2018 John Fox (McMaster University) Statistical Models in R ICPSR 2018 1 / 19 Linear and Generalized Linear

More information

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

The performance of estimation methods for generalized linear mixed models

The performance of estimation methods for generalized linear mixed models University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2008 The performance of estimation methods for generalized linear

More information

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS Ivy Liu and Dong Q. Wang School of Mathematics, Statistics and Computer Science Victoria University of Wellington New Zealand Corresponding

More information

Generalized linear mixed models (GLMMs) for dependent compound risk models

Generalized linear mixed models (GLMMs) for dependent compound risk models Generalized linear mixed models (GLMMs) for dependent compound risk models Emiliano A. Valdez joint work with H. Jeong, J. Ahn and S. Park University of Connecticut 52nd Actuarial Research Conference Georgia

More information

Comparison of Estimators in GLM with Binary Data

Comparison of Estimators in GLM with Binary Data Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 10 11-2014 Comparison of Estimators in GLM with Binary Data D. M. Sakate Shivaji University, Kolhapur, India, dms.stats@gmail.com

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Generalized Estimating Equations

Generalized Estimating Equations Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson

More information

Generalized linear mixed models for dependent compound risk models

Generalized linear mixed models for dependent compound risk models Generalized linear mixed models for dependent compound risk models Emiliano A. Valdez joint work with H. Jeong, J. Ahn and S. Park University of Connecticut ASTIN/AFIR Colloquium 2017 Panama City, Panama

More information

BOOTSTRAPPING WITH MODELS FOR COUNT DATA

BOOTSTRAPPING WITH MODELS FOR COUNT DATA Journal of Biopharmaceutical Statistics, 21: 1164 1176, 2011 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543406.2011.607748 BOOTSTRAPPING WITH MODELS FOR

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department

More information

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Andreas Groll 1 and Gerhard Tutz 2 1 Department of Statistics, University of Munich, Akademiestrasse 1, D-80799, Munich,

More information

Longitudinal data analysis using generalized linear models

Longitudinal data analysis using generalized linear models Biomttrika (1986). 73. 1. pp. 13-22 13 I'rinlfH in flreal Britain Longitudinal data analysis using generalized linear models BY KUNG-YEE LIANG AND SCOTT L. ZEGER Department of Biostatistics, Johns Hopkins

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Bias-corrected AIC for selecting variables in Poisson regression models

Bias-corrected AIC for selecting variables in Poisson regression models Bias-corrected AIC for selecting variables in Poisson regression models Ken-ichi Kamo (a), Hirokazu Yanagihara (b) and Kenichi Satoh (c) (a) Corresponding author: Department of Liberal Arts and Sciences,

More information

Generalized, Linear, and Mixed Models

Generalized, Linear, and Mixed Models Generalized, Linear, and Mixed Models CHARLES E. McCULLOCH SHAYLER.SEARLE Departments of Statistical Science and Biometrics Cornell University A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS, INC. New

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

D-optimal Designs for Factorial Experiments under Generalized Linear Models

D-optimal Designs for Factorial Experiments under Generalized Linear Models D-optimal Designs for Factorial Experiments under Generalized Linear Models Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago Joint research with Abhyuday

More information

On testing the equality of mean vectors in high dimension

On testing the equality of mean vectors in high dimension ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 17, Number 1, June 2013 Available online at www.math.ut.ee/acta/ On testing the equality of mean vectors in high dimension Muni S.

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

Generalized Estimating Equations (gee) for glm type data

Generalized Estimating Equations (gee) for glm type data Generalized Estimating Equations (gee) for glm type data Søren Højsgaard mailto:sorenh@agrsci.dk Biometry Research Unit Danish Institute of Agricultural Sciences January 23, 2006 Printed: January 23, 2006

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at Biometrika Trust Some Remarks on Overdispersion Author(s): D. R. Cox Source: Biometrika, Vol. 70, No. 1 (Apr., 1983), pp. 269-274 Published by: Oxford University Press on behalf of Biometrika Trust Stable

More information

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC Mantel-Haenszel Test Statistics for Correlated Binary Data by Jie Zhang and Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 tel: (919) 515-1918 fax: (919)

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study Science Journal of Applied Mathematics and Statistics 2014; 2(1): 20-25 Published online February 20, 2014 (http://www.sciencepublishinggroup.com/j/sjams) doi: 10.11648/j.sjams.20140201.13 Robust covariance

More information

Generalized Linear Models I

Generalized Linear Models I Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.

More information

High-Throughput Sequencing Course

High-Throughput Sequencing Course High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an

More information

Linear Regression With Special Variables

Linear Regression With Special Variables Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

More information