On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

Similar documents
Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Bias-corrected AIC for selecting variables in Poisson regression models

MODEL SELECTION BASED ON QUASI-LIKELIHOOD WITH APPLICATION TO OVERDISPERSED DATA

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition

Stat 579: Generalized Linear Models and Extensions

Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Modfiied Conditional AIC in Linear Mixed Models

LOGISTIC REGRESSION Joseph M. Hilbe

Generalized Estimating Equations (gee) for glm type data

Using Estimating Equations for Spatially Correlated A

Model and Working Correlation Structure Selection in GEE Analyses of Longitudinal Data

GENERAL FORMULA OF BIAS-CORRECTED AIC IN GENERALIZED LINEAR MODELS

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Bootstrap prediction and Bayesian prediction under misspecified models

Generalized Quasi-likelihood (GQL) Inference* by Brajendra C. Sutradhar Memorial University address:

arxiv: v1 [stat.ap] 17 Jun 2013

Longitudinal analysis of ordinal data

Pricing and Risk Analysis of a Long-Term Care Insurance Contract in a non-markov Multi-State Model

Bias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification

Repeated ordinal measurements: a generalised estimating equation approach

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis

Outline of GLMs. Definitions

The perils of quasi-likelihood information criteria

Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension

Assessing GEE Models with Longitudinal Ordinal Data by Global Odds Ratio

On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation

Order Selection for Vector Autoregressive Models

Longitudinal data analysis using generalized linear models

Geographically Weighted Regression as a Statistical Model

CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA

A weighted simulation-based estimator for incomplete longitudinal data models

Statistics 203: Introduction to Regression and Analysis of Variance Course review

An Unbiased C p Criterion for Multivariate Ridge Regression

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS

High-dimensional asymptotic expansions for the distributions of canonical correlations

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

PQL Estimation Biases in Generalized Linear Mixed Models

STAT 526 Advanced Statistical Methodology

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Trends in Human Development Index of European Union

Conditional Inference Functions for Mixed-Effects Models with Unspecified Random-Effects Distribution

The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.1/27

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Generalized Estimating Equations

,..., θ(2),..., θ(n)

Sample Size and Power Considerations for Longitudinal Studies

Comparison of Estimators in GLM with Binary Data

An Akaike Criterion based on Kullback Symmetric Divergence in the Presence of Incomplete-Data

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis

KULLBACK-LEIBLER INFORMATION THEORY A BASIS FOR MODEL SELECTION AND INFERENCE

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Generalized Linear Models I

Model comparison and selection

Estimating prediction error in mixed models

Generalized Linear Models. Kurt Hornik

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

MODEL SELECTION FOR CORRELATED DATA WITH DIVERGING NUMBER OF PARAMETERS

Akaike Information Criterion

University of California, Berkeley

Generalized linear models

Properties and approximations of some matrix variate probability density functions

Asymptotic inference for a nonstationary double ar(1) model

ATINER's Conference Paper Series STA

GEE for Longitudinal Data - Chapter 8

Ken-ichi Kamo Department of Liberal Arts and Sciences, Sapporo Medical University, Hokkaido, Japan

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Approximate Likelihoods

Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling

A New Iterative Procedure for Estimation of RCA Parameters Based on Estimating Functions

Justine Shults 1,,, Wenguang Sun 1,XinTu 2, Hanjoo Kim 1, Jay Amsterdam 3, Joseph M. Hilbe 4,5 and Thomas Ten-Have 1

QUANTIFYING PQL BIAS IN ESTIMATING CLUSTER-LEVEL COVARIATE EFFECTS IN GENERALIZED LINEAR MIXED MODELS FOR GROUP-RANDOMIZED TRIALS

Covariance function estimation in Gaussian process regression

On the conservative multivariate multiple comparison procedure of correlated mean vectors with a control

AR-order estimation by testing sets using the Modified Information Criterion

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data

QIC program and model selection in GEE analyses

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Covariance modelling for longitudinal randomised controlled trials

Pearson-type goodness-of-fit test with bootstrap maximum likelihood estimation. Citation Electronic Journal of Statistics, 2013, v. 7, p.

if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1

Journal of Statistical Software

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

Variable Selection in Multivariate Multiple Regression

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM

DYNAMIC ECONOMETRIC MODELS Vol. 9 Nicolaus Copernicus University Toruń Mariola Piłatowska Nicolaus Copernicus University in Toruń

Transcription:

On Properties of QIC in Generalized Estimating Equations Shinpei Imori Graduate School of Engineering Science, Osaka University 1-3 Machikaneyama-cho, Toyonaka, Osaka 560-8531, Japan E-mail: imori.stat@gmail.com Abstract: The generalized estimating equations (GEE) approach has attracted considerable interest in analysis of correlated response data. An information criterion based on the quasi-likelihood in the GEE framework, called the quasi-likelihood under the independence model criterion (QIC), is proposed in the past literature. This paper studies the properties of the QIC. We establish a formal derivation of the QIC as an asymptotically unbiased estimator of the prediction risk based on the quasi-likelihood. Especially, when deriving the QIC, we explicitly take into account the effect of estimating the correlation matrix used in the GEE procedure. Furthermore, we discuss an adequacy of the risk function used in the derivation of the QIC. Key words: Generalized estimating equations; Longitudinal data analysis; Quasi-likelihood under the independence model criterion; Variable selection. 1. Introduction 1

In analysis of biomedical study, longitudinal data often arise, which data are correlated within individual response. The generalized estimating equations (GEE) approach developed by Liang & Zeger (1986) has been of considerable interest for the parameter estimating in such data. The GEE methodology avoids assuming the simultaneous distribution of observations by only assuming a functional form or the marginal distribution at each time and a correlation structure called working correlation matrix. Furthermore, under some regularity conditions (see, Xie & Yang, 2003; Balan & Schiopu-Kratina, 2005), the GEE estimator has properties which include asymptotic normality and consistency even when the working correlation matrix has been misspecified. From these advantage points, the GEE method is often used to analyze the longitudinal data (e.g., Thall & Vail, 1990; Barnhart & Williamson, 1998; Vens & Ziegler, 2012). As with a common regression analysis, we should select the best model among candidate models in the GEE methodology. Model selection in GEE framework has been extensively discussed in previous literature (e.g., Pan, 2001b, 2002; Cantoni, et al., 2005; Shen & Chen, 2012) In general, many model selection methods have been proposed (for details of statistical model selection; see Burnham & Anderson, 2002). It is famous for a model selection to measure the goodness of fit of the model by the risk function based on the expected Kullback-Leibler (KL) information (Kullback & Leibler, 1951). For actual use, we must estimate the risk function, which depends on unknown parameters. Akaike s information criterion (AIC), which is proposed 2

by Akaike (1973, 1974) as an estimator of the risk function based on KL information is used for selecting the best model among the candidate models. Since the AIC can be simply defined as 2 the maximum log-likelihood +2 the number of parameters, the AIC is widely applied in many fields for selecting appropriate models using a set of explanatory variables. It may not be adequate to directly use the AIC since we do not assume the multivariate distribution of observations in the GEE procedure. Pan (2001a) extended the AIC to the GEE method based on quasi-likelihood constructed from the estimating equations (Wedderburn, 1974), which is called Quasi-likelihood under the independence model criterion (QIC). The QIC is often used as an alternative of the AIC in applied longitudinal analysis, and a representative model selector in GEE methodology. On the other hands, the theoretical properties of the QIC have not been discussed until now. However, the QIC was derived by ignoring the calculation of the particular part. The aim of this paper is to study these problems, especially we examine from the viewpoint of estimating the correlation parameter. In the present paper, we establish a formal derivation of the QIC (called formal QIC or fqic) as an asymptotic unbiased estimate of the prediction risk based on the quasi-likelihood. Notably, when deriving the formal QIC, we explicitly take into account the effect of estimating the correlation matrix used in the GEE procedure. Furthermore, we discuss an adequacy of the risk function used in the derivation of the QIC. Concretely, when we extend the independence quasi-likelihood to more general case (i.e., the mul- 3

tivariate quasi-likelihood) in order to consider correlations of response, the risk function reduces to the same one used in the derivation of the QIC. For details of the meaning of this adequacy, we show in section 3. The rest of the paper is organized as follows. Section 2 introduces the GEE approach and re-derive the QIC formally. We present the properties of the QIC in section 3. A comparison between the formal QIC and original QIC is given in section 4. In section 5, we conclude this paper. Technical details are provided in the Appendix. 2. Modifications of the original QIC In this section, we present the definition of the GEE and re-derive the QIC, formally. For individuals i = 1,..., n, we have an m-dimensional response vector y i = (y i1,..., y im ), and an m p explanatory variable matrix X i = (x i1,..., x im ). Note that y = (y 1,..., y n). In general, the components of y i are correlated but y 1,..., y n are independent. Furthermore, we do not decide the simultaneous distribution of each y i. In the GEE frameworks, we assume y ij be the generalized linear model (GLM) developed in Nelder & Wedderburn (1972). Hence, the probability density function of y ij is expressed as follows: { } θij y ij b(θ ij ) f(y ij ; θ ij, ϕ) = exp + c(y ij, ϕ), a(ϕ) where a( ), b( ), and c( ) are known functions, θ ij is an unknown location parameter, and ϕ is a known scale parameter. Let us the first and second 4

moments for y ij be E[y ij ] = µ ij, Var[y ij ] = σij, 2 respectively. From the GLM properties, we can obtain µ ij = b (θ ij ), σij 2 = σ 2 (µ ij ) = a(ϕ)b (θ ij ). For all θ i = (θ i1,..., θ im ), we would like to note that h(µ ij ) = h(µ(θ ij )) = η ij, where h( ) is a link function, η ij = x ijβ is a linear predictor, and β is an unknown regression parameter. Herein, the GEE for β is expressed as follows: g(β; R, y) = D iv 1 i (y i µ i ) = 0 p. (1) where D i = µ i / β = A i i X i, i = diag( θ i1 / η i1,..., θ im / η im ), A i = a(ϕ)diag{b (θ i1 ),..., b (θ im )}, µ i = (µ i1,..., µ im ), V i = A 1/2 i RA 1/2 i, R = R(α) is a working correlation matrix and α is a q-dimensional parameter, which is referred to as a correlation parameter. We can choose some useful working correlation matrices as the situation demands. For instance, independence (i.e., (R) jk = 0, j k), exchangeable (i.e., (R) jk = α, j k), first-order autoregressive (AR-1) (i.e., (R) jk = (R) kj = α j k, j > k), one-dependent (i.e., (R) j,j+1 = (R) j+1,j = α), and unstructured (i.e., (R) jk = (R) kj = α jk, j > k) are often used. By solving (1), we can obtain ˆβ, which is the GEE estimator of β. In order to guarantee the asymptotic properties, we will assume the regularity appropriate conditions. The risk function based on the independent quasi-likelihood is Risk = E y E y [ 2Q(y ; ˆβ)], (2) 5

where y is a future observation and Q(y; β) is the quasi-likelihood proposed by McCullagh & Nelder (1989). That is, Q(y; β) = m µij j=1 y ij y ij t σ 2 (t) dt, where σ 2 ( ) is a variance function such that σ 2 (µ ij ) = a(ϕ)b (θ ij ). Let B be the bias when estimating (2) by 2Q(y; ˆβ). The QIC is defined as QIC = 2Q(y; ˆβ) + ˆB, (3) where ˆB is an estimator of B. We divide B in order to calculate precisely as follows: B = E y E y [ 2Q(y ; ˆβ) + 2Q(y; ˆβ)] = E y E y [ 2Q(y ; ˆβ) + 2Q(y ; β)] (B1) + E y E y [ 2Q(y ; β) + 2Q(y; β)] (B2) + E y [ 2Q(y; β) + 2Q(y; ˆβ)]. (B3) It is obvious (B2)= 0. By applying a Taylor expansion around ˆβ = β to equation Q(y; ˆβ) yields Q(y; ˆβ) = Q(y; β) + Q(y; β) β ( ˆβ β) + 1 2 ( ˆβ β) 2 Q(y; β) β β ( ˆβ β) + o p (1). (4) 6

Note that Q(y; β) β = D ia 1 i (y i µ i ), ˆβ = β + 1 n Ω 1 R g(β; R, y) + o p(n 1/2 ), 2 Q(y; β) β β = nω I + o p (n), (5) where Ω R = 1 n D iv 1 i D i, Ω I = 1 n D ia 1 i D i. Substituting (4) and (5) into (B1) and (B3), respectively, we can show that (B1) = tr(v s Ω I ) + o(1), (B3) = 2tr(V a Ω 1 R ) tr(v sω I ) + o(1), (6) where V a = 1 n V s = Ω 1 R D iv 1 i Cov[y i ]A 1 i D i, ( 1 n From (6), an expansion of B is given as follows: ) (7) D iv 1 i Cov[y i ]V 1 i D i Ω 1 R. B = 2tr(V a Ω 1 R ) + o(1). (8) 7

By substituting (8) into (3) and ignoring the term of o(1), we can obtain fqic = 2Q(y; ˆβ) + 2tr( ˆV a ˆΩ 1 R ), where ˆV a, ˆVs ˆΩR and ˆΩ I are substituted ˆβ into β in V a, V s, Ω R and Ω I, respectively. On the other hand, the original QIC proposed in Pan (2001a) is an estimator of (2) defined as follows: original QIC = 2Q(y; ˆβ) + 2tr( ˆV s ˆΩI ). We would like to note that V a = V s and Ω R = Ω I when R is independence. Furthermore, when R includes the true correlation structure, Ω I = V a and Ω 1 R = V s are achieved since V i = Cov[y i ] in this situations. Hence, the original QIC is exactly and asymptotically equivalent to the formal QIC when the working correlation matrix is independence and includes the true correlation structure, respectively. More comparisons are given in section 4. 3. Effect of estimating the correlation parameter We present two properties of the formal QIC in this section. Firstly, we mention about the bias of the QIC. In Pan (2001a), the original QIC was derived by only considering the bias that arises when estimating β. However, we would like to note that we need to estimate α when we use the QIC in real 8

data analysis. Hence, the QIC can be regarded as a function of β and α and we must use the QIC by substituting each estimator of unknown parameters. Hence, we should also consider the bias that arises when estimating α. According to Liang & Zeger (1986), we can assume that α is defined as a function of the sample correlation matrix R(β), R(β) = 1 n p A 1/2 i (y i µ i )(y i µ i ) A 1/2 i. For example, by assuming the exchangeable covariance structure (i.e., (R) jk = α), α = 1 2 1 n p m(m 1) a(ϕ) m y ij b (θ ij ) y ik b (θ ik ). (9) b (θ ij ) 1/2 b (θ ik ) 1/2 j<k When we explicitly take into account the effect of estimating α, the simultaneous estimating equation of β and α is given as follows: α = α(β) = h( R(β)), D iv 1 i (y i µ i ) = 0 p, (10) where h( ) is a q-dimensional vector-valued function. Let ˆβ s and ˆα s be the solution of (10). Under these situations, the risk function (2) is rewritten as E y E y [ 2Q(y ; ˆβ s )], and we can show the following theorem. 9

Theorem.1. Let us assume vec{r(α 0 ) 1 }, α(β) = O α β p (1), (11) where α 0 = h(r 0 ) and R 0 is the true correlation matrix. The formal QIC is an asymptotically unbiased estimator of the Risk function even when taking account into the effect of estimating α. That is, E y E y [ 2Q(y ; ˆβ s )] E y [fqic(y; ˆβ s )] = o(1). Details of a proof of theorem 1 are given in Appendix A.1. This theorem leads that the formal QIC is an asymptotic unbiased estimator of the risk function in taking account into the effect of estimating correlation parameter. This is an optimality of the formal QIC. Liang & Zeger (1986) provided ways to estimate that can satisfy the assumption (11). For example, in the case of the exchangeable correlation structure, R(α) 1 = 1 1 α (I 1 m P m ) + 1 + (m 1)α P m, (12) 10

where P m = 1 m 1 m/m. Derivations of (9) and (12) are expressed as follows: α(β) = 1 β (n p) vec{r(α 0 ) 1 } α = 2 1 m(m 1) a(ϕ) m j k [ b (θ ij ) 1/2 + {y ] ij b (θ ij )}b (θ ij ) 2b (θ ij ) 3/2 y ik b (θ ik ) θ ij x b (θ ik ) 1/2 ij, η ij 1 (1 α 0 ) vec(i m 1 2 m P m ) {1 + (m 1)α 0 } vec(p m). 2 Hence, the assumption (11) is satisfied when α 0 1, 1/(m 1), which is usually assumed for the non-singularity of the working correlation matrix. Next, we consider the risk function of the formal QIC. In spite of assuming a correlation structure for response in the GEE procedure, the risk function (2) is based on the independence quasi-likelihood. In order to resolve this contradiction, we attempt to expand the risk function to the quasilikelihood with considering the correlations, which is the multivariate quasilikelihood (McCullagh & Nelder, 1989). As mentioned in Pan (2001a), the GEE methodology is closely related to the multivariate quasi-likelihood. The ith multivariate quasi-likelihood Q m (y i ; M, β) is given as the following differential form: Q m (y i ; M, β) µ i = M(µ i ) 1 (y i µ i ). where M( ) is an m m matrix-valued function. We can get the multivariate 11

quasi-likelihood to line integrate as the following expression. Q m (y i ; M, β) = t=µi t=y i (y i t) M(t) 1 dt, (13) where t = (t 1,..., t m ). In general, the value of this integration is misspecified since it depends on the integral path chosen. Such problem had been already discussed. Wang gave an example about the non-integrability of the GEE (see Wang, 1999, Example 2.4). The risk function should be specified because of which is a criteria of statistical models. The following theorem gives a condition about the pathdependence of the multivariate quasi-likelihood. Theorem.2. To assume the independence structure for correlation matrix and/or the constant variance function σ 2 ( ) is a necessary and sufficient condition in order to avoid the path-dependence of the multivariate quasilikelihood in the GEE approach. A proof of theorem 2 is given in Appendix A.2. This theorem suggests an adequacy of the risk function of the formal QIC. It may be possible to decide an ad hoc path of the multivariate quasi-likelihood (for example of the path; see McCullagh & Nelder, 1989). However, it is difficult to derive a new model selector based on the path-decided multivariate quasi-likelihood since the derivation of this multivariate quasi-likelihood become more comparison form than the formal QIC. In order to establish the uniqueness of the risk function 12

among the multivariate quasi-likelihood class, we recommend assuming the model independence for the risk function, which leads the same as the risk function of the formal QIC. 4. Comparison of the formal QIC and the original QIC Let us compare the bias term of the formal QIC derived in section 2 with the original QIC through a simulation study. We prepared the four candidate models with 500 samples which is constructed of a 4-dimensional response vector y i = (y i1,..., y i4 ) and an 4 6 explanatory variable matrix X i = (x i1,..., x i4 ). Let x ij = (1, x ij1,..., x i4 ) and x ijk be random variable which are independent and identically distributed as the uniform distribution U(0, 1), k = 1,..., 4. We assume y ij is distributed according to a logistic regression model B(1, p ij ) where p ij = 1/{1+exp( x ijβ)}, β = (1, 1, 0, 0, 0). Let an explanatory variable matrix in the kth candidate model consist of the first k columns of X i, k = 1,..., 4. As mentioned in Section 2, the original QIC is exactly and asymptotically equivalent to the formal QIC when the working correlation matrix is independence and includes the true correlation structure, respectively. However, in the other case, the difference may be greatly. For example, we assume that R is one-dependent and R 0 is exchangeable with correlation parameter α = 0.3. In above situations, we simulated 10,000 repetitions in order to compare the formal QIC and original QIC. We show the average value and bias of the 13

Table 1: mean and bias of formal QIC and original QIC in the case 1 candidate model 1 2 3 4 risk function 2619.202 2620.125 2621.067 2622.005 formal QIC mean 2619.362 2620.284 2621.210 2622.128 bias 0.160 0.159 0.142 0.123 original QIC mean 2619.730 2620.939 2622.152 2623.370 bias 0.367 0.655 0.943 1.242 Figure 1: value of the fomal QIC and original QIC in the case 1 2620 2621 2622 2623 Risk function Formal QIC Original QIC 1 2 3 4 Model index 14

each QIC in table 1. From table 1, we can see that the bias of the original QIC is getting large when the number of the parameter is getting increase even when the formal QIC keeps a stable value in each model. Hence, there may be a non-negligible difference between the formal QIC and original QIC. Therefore, we recommend the use of the formal QIC rather than the original QIC for the model selection in GEE procedure. 5. Conclusions In the present paper, we derive the formal QIC as an asymptotic unbiased estimator of the risk function based on the independent quasi-likelihood. Through the simulation study, we illustrate that the difference between the formal QIC and original QIC may have non-negligible effect for model selection, especially when the true correlation structure completely different from the working correlation structure. Furthermore, we show two theorems regarding the formal QIC. In Theorem 1, we prove the asymptotically unbiasedness of the formal QIC in taking account into the effect of estimating correlation parameter. This result may arise because of the risk function of the formal QIC, which do not include the correlation parameter since it is based on the independent quasi-likelihood. In Theorem 2, we obtain an adequacy of the formal QIC in considering a wide class of the risk function, which is based on the multivariate quasi-likelihood. The unique risk function can be established by assuming the independence structure for mul- 15

tivariate quasi-likelihood. These theorems guarantee the adequacy of the formal QIC. Appendix A.1. Proof of Theorem 1 We can re-writing (10) as D ia 1/2 i R(α(β)) 1 A 1/2 i (y i µ i ) = 0 p. (A.1) Under the assumptions of Theorem 1, by applying chain rule into the derivation of (A.1), we are immediately able to show vec {R(α(β)) 1 } = vec {R(α) 1 } α(β) β α β = vec {R(α 0) 1 } α α(β) β + o p (1) = O p (1). (A.2) By combining (A.1) and (A.2), we obtain 1 n D ia 1/2 i R(α(β)) 1 A 1/2 i (y i µ i ) β = Ω R + o p (1). This leads the following result, n( ˆβs β) = 1 n Ω 1 R g(β; R, D) + o p(1). 16

The rest of proof is very similar to the derivation of the QIC in section 2. We obtain E y E y [Q(y ; ˆβ)] E y [Q(y ; ˆβ s )] = o(1). A.2. Proof of Theorem 2 Applying the Stokes theorem, which is usually used in differential geometry, we obtain the necessary and sufficient conditions in order to avoid the path-dependence of (13) as follows: m j,k,l (y ij t j ) (M(t) 1 ) jk t l = 0, (A.3) for individuals, i = 1,..., n. Hence, we re-write the condition (A.3) as { m (M(t) 1 ) jk (y ij t i ) t l j=1 (M(t) 1 ) jl t k } = 0, k > l. Since we assume M(t) = A(t) 1/2 RA(t) 1/2 in the GEE methodology, this is equivalent to the following conditions: (y il t l ) σ(t k) 1 σ(t l ) 1 (R 1 ) lk = (y ik t k ) σ(t l) 1 σ(t k ) 1 (R 1 ) kl (A.4) t l t k 17

where σ(µ ij ) 2 = a(ϕ)b (θ ij ). If we assume (R) kl 0, (A.4) is equivalent to σ(t l ) 1 = 0. t l From above results, we can see that for all 1 k < l m, (R) kl = 0 or the variance function σ 2 ( ) is constant in order to achieve the expression (A.4). The proof of Theorem 2 is completed. References Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (eds. Petrov, B. N. & Csáki, F.), 267 281, Akadémiai Kiadó, Budapest. Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automatic Control, AC-19, 716 723. Balan R. M. & Schiopu-Kratina, I. (2005). Asymptotic results with generalized estimating equations for longitudinal data. Ann. Statist., 33, 522 541. Barnhart, H. X. & Williamson, J. M. (1998). Goodness-of-fit tests for the GEE modeling with binary responses. Biometrics., 54, 720 729. Burnham, K. P. & Anderson, D. R. (2002). Model selection and multi- 18

model inference: A practical information-theoretic approach, 2nd edition. Springer-Verlag, New York. Cantoni, E., Flemming, M. & Ronchetti, E. (2005). Variable selection for marginal longitudinal generalized linear models. Biometrics., 61, 507 514. Kullback, S. & Leibler, R. (1951). On information and sufficiency. Ann. Math. Statist., 22, 79 86. Liang, K-Y. & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika., 73, 13 22. McCullagh, P. & Nelder, J. A. (1989). Generalized linear models, 2nd edition. Chapman and Hall, London. Nelder, J. A. & Wedderburn, W. M. (1972). Generalized linear models. J. R. Statist. Soc. A., 135, 370 384. Pan, W. (2001a). Akaike s information criterion in generalized estimating equations. Biometrics., 57, 120 125. Pan, W. (2001b). Model selection in estimating equations. Biometrics., 57, 529 534. Pan, W. (2002). Goodness-of-fit tests for GEE with correlated binary data. Scand. J. Statist., 29, 101 110. 19

R Development Core Team (2012). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.r-project.org/. Shen, C-W. & Chen, Y-H. (2012). Model selection for Generalized Estimating equations accommodating dropout missingness. Biometrics, 68, 1046 1054. Thall, P. F. & Vail, S. C. (1990). Some covariance models for longitudinal count data with overdispersion. Biometrics, 46, 657 671. Vens, M. & Ziegler, A. (2012). Generalized estimating equations and regression diagnostic for longitudinal controlled trials: A case study. Comput. Statist. Data Anal., 56, 1232 1242. Wang, J. (1999). Artificial likelihoods for general nonlinear regressions (in Japanese). Proc. Inst. Statist. Math, 47, 49 61. Wedderburn, W. M. (1974). Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika., 61, 439 447. Xie, M. & Yang, Y. (2003). Asymptotics for generalized estimating equations with large cluster sizes. Ann. Statist., 31, 310 347. 20