Semiparametric Mean-Covariance. Regression Analysis for Longitudinal Data

Size: px
Start display at page:

Download "Semiparametric Mean-Covariance. Regression Analysis for Longitudinal Data"

Transcription

1 Semiparametric Mean-Covariance Regression Analysis for Longitudinal Data July 14, 2009 Acknowledgments We would like to thank the Editor, an associate editor and two referees for all the constructive comments and detailed suggestions, which have led to a substantially improved paper.

2 Abstract Efficient estimation of the regression coefficients in longitudinal data analysis requires a correct specification of the covariance structure. Existing approaches usually focus on modeling the mean with specification of certain covariance structures, which may lead to inefficient or biased estimators of parameters in the mean if misspecification occurs. In this paper, we propose a data-driven approach based on semiparametric regression models for the mean and the covariance simultaneously, motivated by the modified Cholesky decomposition. A regression spline based approach using generalized estimating equations is developed to estimate the parameters in the mean and the covariance. The resulting estimators for the regression coefficients in both the mean and the covariance are shown to be consistent and asymptotically normally distributed. In addition, the nonparametric functions in these two structures are estimated at their optimal rate of convergence. Simulation studies and a real data analysis show that the proposed approach yields highly efficient estimators for the parameters in the mean, and provides parsimonious estimation for the covariance structure. Some Keywords: Covariance misspecification; Efficiency; Generalized estimating equation; Longitudinal data; Modified Cholesky decomposition; Semiparametric models. 1 Background Longitudinal data arise frequently in the biomedical, epidemiological, social and economical fields. A salient feature of longitudinal studies is that subjects are measured repeatedly over time. Thus, observations for the same subject are intrinsically correlated. Regression methods for such data sets accounting for within-subject correlation is abundant in the literature (Diggle et al., 2002; Wu and Zhang, 2006). 1

3 Within the framework of generalized linear models (GLM), the technique of generalized estimating equations (GEE, Liang and Zeger, 1986) is widely used for dealing with longitudinal data. GEE makes the use of a working correlation model to estimate the mean parameters in the marginal specification of the regression. Although consistency of the mean parameter estimators is not affected, misspecification of the correlation may result in a great loss of efficiency (Wang and Carey, 2003). On the other hand, the correlation structure itself may be of scientific interest (Prentice and Zhao, 1991; Diggle and Verbyla, 1998). Therefore, there is a great need to model the covariance structure. However, modeling the correlation matrix is more challenging than modeling the mean as there are usually much more parameters in the former and the positive definiteness of the covariance matrix has to be assured. Recently, Pourahmadi (1999, 2000) proposed a modified Cholesky decomposition to decompose the covariance matrix. This decomposition is attractive as it leads automatically to positive definite covariance matrices, and the parameters in it are related to well founded statistical concepts. Thereafter, the parameters in this decomposition can be modeled via regression techniques, enabling model based inference for the parameters in the mean and the covariances. See Pan and MacKenzie (2003) for a related discussion. More recently, Ye and Pan (2006) further proposed to use GEE to model the parameters in this decomposition by several sets of parametric estimating equations. There is a clear need to relax the parametric assumption posed in Ye and Pan (2006), as model misspecification may result in biased estimation, a problem even more severe than misspecification of the covariance. An attractive approach is the semiparametric regression model, or the partly linear model (PLM), which retains the flexibility of the nonparametric model but avoids the need to model a fully nonparametric model (Härdle, Liang and Gao, 2000). Existing applications of PLM 2

4 to longitudinal data accounting for within-subject dependence mainly focus on regression analysis of the mean. The covariance is usually assumed known up to a few parameters. See for example, the kernel and spline approaches in Lin and Carroll (2001), Welch, Lin and Carroll (2002), He, Zhu and Fung (2002), Wang (2003), Wang, Carroll and Lin (2005), He, Fung and Zhu (2005), and Lin and Carroll (2006). Compared to the models for the mean in longitudinal data analysis, model based analysis for the covariance is much less studied. To address this issue, we propose semiparametric models for the mean and the covariance structure for longitudinal data. Our formulation builds on the modified Cholesky decomposition advocated by Pourahmadi (1999) such that the entries in this decomposition can be modeled by unconstrained semiparametric regression models. Our approach is more flexible than the model in Ye and Pan (2006) in both the mean and the covariance. There are also related works in this regard. Fan, Fan and Lv (2008) studied a factor model method to parametrize a high dimensional covariance matrix. Wu and Pourahmadi (2003) proposed nonparametric estimates of the covariance matrix, but their method does not deal with irregular observed measurements and only models a few of the subdiagonals of the lower triangular matrix in the modified Cholesky decomposition. Fan, Huang and Li (2007) and Fan and Wu (2008) studied a different semiparametric model for the covariance structure by imposing the familiar variance-correlation decomposition. They estimated the marginal variance via kernel smoothing and proposed a parametric model for the correlation matrix. Similar to the method in Fan, Huang and Li (2007), our approach can handle irregularly and possibly subject-specific times points. We show that the resulting estimators for the regression coefficients in the mean and the variance are consistent and asymptotically normally distributed. Furthermore, the nonparametric parts are estimated at the optimal convergence rate by 3

5 using regression splines. The rest of the paper is organized as follows. Section 2 introduces the models and estimation methods. Theoretical properties of the proposed estimators are given in Section 3. Extensive simulations and data analysis are presented in Section 4. Section 5 gives some concluding remarks. All the proofs are relegated to the Appendix. 2 The Models and The Estimation Methods 2.1 The models Let y i = (y i1,, y ini ) be the n i repeated measurements at time points t i = (t i1,, t ini ) of the ith subject (i = 1,, m). Here t ij may be the time or any time-dependent covariate which is modeled nonparametrically. Without loss of generality, we assume that all {t ij } are scaled into the interval [0, 1]. Furthermore, we assume that the first two moments of the response satisfy E(y ij x ij, t ij ) = µ 0 ij and V (y i x i, t i ) = Σ 0i, where x ij is a p-vector covariate and x i = (x i1,..., x ini ) is the covariate matrix for the ith subject. To guarantee the positive definiteness of the matrices Σ 0i, an explicit way of modeling Σ 0i is via its modified Cholesky decomposition as Φ i Σ 0i Φ i = D 0i, where Φ i is a lower triangular matrix with 1 s on its diagonal, and D 0i is a diagonal matrix. As indicated by Pourahmadi (1999), this decomposition has a clear statistical interpretation. The below-diagonal entries of Φ i are the negatives of the autoregressive coefficients φ ijk defined in j 1 ŷ ij = µ ij + φ ijk (y ik µ ik ). (1) k=1 That is, the autoregressive coefficients are the population regression coefficients of the linear regression of y ij on its predecessors y i(j 1),, y i1. The diagonal entries σ0ij 2 of D 0i can be seen as the innovation variance σ0ij 2 = var(ɛ ij ), for ɛ ij = y ij ŷ ij. Clearly, the modified Cholesky decomposition is advantageous in that φ and log(σ 2 ) 4

6 are unconstrained, rather than a constrained parameter Σ 0i that must be positive definite. To use the semiparametric regression tools, we postulate three sets of models g(µ 0 ij) = x ijβ 0 + f 0 (t ij ), φ ijk = w ijkγ 0, log(σ 2 0ij) = z ijλ 0 + f 1 (t ij ), (2) where x ij, w ijk and z ij are the p 1, q 1 and d 1 vectors of covariates respectively; β 0, γ 0 and λ 0 are the regression coefficients; f 0 ( ) and f 1 ( ) are unknown smooth functions. The known link function g( ) is assumed to be monotone and differentiable. The covariates x ij, w ijk and z ij may contain the baseline covariates, the time and the associated interactions. The idea of using (2) reflects the belief that regression models for the autoregressive coefficients and innovation variances are as important as those for the mean. Furthermore, model based analysis of these parameters permit more accessible statistical inference. This technique was also used by Ye and Pan (2006), but they assumed parametric models for g(µ 0 ij ) and log(σ2 0ij ). As discussed earlier, the semiparametric estimating equations are more flexible and can be less biased if the parametric assumption is violated. It is interesting to make some comparisons with the approach in Fan et al. (2007). They adopted the variance-correlation decomposition by using a well-behaved matrix such as AR(1) or ARMA(1,1) as the correlation matrix to ensure positive definiteness of the covariance. We use the modified Cholesky decomposition, which allows unconstrained parametrization. Thus regression based analysis for entries in this decomposition are possible. These two approaches depend on different ways to decompose covariance and each has its own merits. 2.2 The estimating equations The two nonparametric functions f 0 and f 1 are parametrized by regression splines, because splines can provide optimal rates of convergence for both the parametric 5

7 and the nonparametric components in PLM with a small number of knots (Heckman 1986; He and Shi 1996, He, et al., 2002). Additionally, any computational algorithm developed for GLM can be used for fitting a semiparametric extension of GLM, since they treat a nonparametric function as a linear function with the basis functions as covariates. For simplicity, we assume that f 0 and f 1 have the same smoothness property. Let 0 = s 0 < s 1 < < s kn < s kn+1 = 1 be a partition of the interval [0, 1]. Using {s i } as the internal knots, we have K = k n + l normalized B-spline basis functions of order l that form a basis for the linear spline space. We use the B- spline basis functions because they have bounded support and are numerically stable (Schumaker, 1981). Thus f 0 (t) and f 1 (t) are approximated by π (t)α and π (t) α respectively, where π(t) = (B 1 (t),, B K (t)) is the vector of basis functions and α, α R K. Let π ij = π(t ij ). With this notation, the nonlinear regression models in (2) can be linearized as following: g(µ ij ) = x ijβ + π (t ij )α = b ijθ, log(σ 2 ij) = z ijλ + π (t ij ) α = h ijρ, (3) where b ij = (x ij, π ij), h ij = (z ij, π ij), θ = (β, α ) and ρ = (λ, α ). We then let µ i = (µ i1,, µ ini ), B i = (b i1,, b ini ) and define x i, π i, z i and H i in a similar fashion. Throughout this paper, a scalar function acting on a vector is set to be the vector of the function on each component, for example, g(µ i ) = (g(µ i1 ),, g(µ ini )). Using the GEE method from Liang and Zeger (1986), we construct the estimating equations for θ, γ and ρ as follows S 1 (θ) = B i iσ 1 i (y i µ i (B i θ)) = 0, S 2 (γ) = V i D 1 i (r i ˆr i ) = 0, S 3 (ρ) = H id i W 1 i (ɛ 2 i σ 2 i (H i ρ)) = 0, where i = i (B i θ) = diag{ġ 1 (b ijθ),, ġ 1 (b in i θ)} and ġ 1 ( ) is the derivative of the inverse function g 1 ( ); r i and ˆr i are the n i 1 vectors with jth components r ij = 6 (4)

8 y ij µ ij and ˆr ij = E(r ij r i1,, r i(j 1) ) = j 1 k=1 φ ijkr ik (j = 1,, n i ). Note that when j = 1 the notation 0 k=1 means zero throughout this paper. It can be shown that D i = diag{σ 2 i1,, σ2 in i } in S 2 (γ) is actually the covariance matrix of r i ˆr i and that V i = ˆr i/ γ is the q n i matrix with jth column ˆr ij / γ = j 1 k=1 r ikw ijk. On the other hand, ɛ 2 i and σ 2 i in S 3 (λ) are the n i 1 vectors with jth components ɛ 2 ij and σ 2 ij (j = 1,, n i), respectively, where ɛ ij = y ij ŷ ij and ŷ ij are given in (1). Obviously, we have E(ɛ 2 i ) = σ 2 i. In addition, W i is the covariance matrix of ɛ 2 i, that is, W i = V (ɛ 2 i ). The solutions of these generalized estimating equations, ˆθ, ˆγ and ˆρ say, are termed the GEE estimators of θ, γ and ρ. As suggested by Ye and Pan (2006), a sandwich working covariance structure W i = A 1/2 i R i (δ)a 1/2 i can be used to approximate the true W i s, where A i = 2diag{σ 4 i1,, σ4 in i } and R i (δ) mimics the correlation between ɛ 2 ij and ɛ2 ik (j k) by introducing a new parameter δ. Typical structures for R i (δ) include compound symmetry (exchangeable) and AR(1). As with the conventional generalized estimating equations for the mean, the parameter δ may have very little effect on the estimators of γ and ρ. Our real data analysis and simulation studies reported in later sections confirm this point very well. The three GEE equations in (4) can be seen as a generalization of the conventional GEE for the mean parameters. If we use a working covariance structure for Σ i in S 1 (θ) and ignore S 2 (γ) and S 3 (ρ), we have the PLM for the mean. The modified Cholesky decomposition allows us to proceed a step further to impose partly linear structures for the variance components. 2.3 The main algorithm The solutions of θ, γ and ρ satisfy the equations in (4). These parameters can be solved iteratively by fixing the other parameters. For example, for fixed values of 7

9 θ and γ, ρ can be computed via the third equation in (4). An application of the quasi-fisher scoring algorithm on equation (4) directly yields the numerical solutions for these parameters. More specifically, given Σ i, θ can be updated by [ m ] 1 m θ (k+1) = θ (k) + B i i Σ 1 i i B i B i i Σ 1 i (y i µ i (B i θ)). (5) θ=θ (k) Given θ and ρ, γ can be updated approximately through [ ] 1 m γ (k+1) = E V id 1 i V i V id 1 i r i. (6) γ=γ (k) In this update, the true errors are replaced by the residuals. We remark that this replacement has minimal effects on the estimation if the mean model is correctly specified. However, if the mean model is misspecified, this replacement would normally give inconsistent estimators for the autoregressive and log innovation parameters. This argument, on the other hand, demonstrates that the partly linear mean model of our approach is at least more robust than a parametric mean model with respect to model misspecification. Finally, given θ and γ, the innovation variance parameters ρ can be updated using [ m ] 1 m ρ (k+1) = ρ (k) + H id i W 1 i D i H i H id i W 1 i (ɛ 2 i σ 2 i ), (7) ρ=ρ (k) Equation (5)-(7) indicate that, iteratively, the parameters can be estimated using weighted generalized least squares. We summarize the algorithm as follows 1. Initialization step: given a starting value (θ (0), γ (0), ρ (0) ), use the model (2) to form the lower triangular matrices T (0) i and diagonal matrices D (0) i. Set k = 0 to obtain Σ (0) i, the starting values of Σ i ; 2. Iteration step: use (5) (7) to calculate θ (k+1), γ (k+1) and ρ (k+1) ; 8

10 3. Updating step: replace θ (k), γ (k) and ρ (k) with the estimators θ (k+1), γ (k+1) and ρ (k+1). Repeat Steps 2-3 until a desired convergence criterion is met. A good starting value of Σ (0) i can be simply chosen as I i, the identity matrix for the ith subject. This initial value of Σ i guarantees the consistency of the initial estimators in the mean, which in return guarantees consistency of the autoregressive parameters and innovative parameters after the first iteration. In the analysis presented in this paper, the convergence criterion is met as long as the successive difference in the Euclidean norm is less than Our numerical experience shows that this iterative algorithm converges very quickly, usually in a few iterations. 2.4 Knot selection Knot selection is an important issue in spline smoothing. The number of knots plays the same role as the smoothing parameter in the smoothing spline model and the bandwidth parameter in kernel smoothing. In this article, we follow the spline literature (He, et al. 2005) and use the sample quartiles of {t ij, i = 1,, m, j = 1,, n i } as knots. For example, if we use three internal knots, they are taken to be the three quartiles of the observed {t ij }. We use cubic splines (splines of order 4) in the numerical simulation section, and the number of internal knots is taken to be the integer part of n 1/5, where n is the sample size. This particular choice is consistent with the asymptotic theory of Section 3. According to our empirical experience, it works well in a wide variety of problems. A data adaptive procedure is to use the leave-one-subject-out cross validation method, which is computationally more demanding. Theoretical justification of the leave-one-subject-out cross validation is possible but is beyond the scope of the paper. 9

11 3 Asymptotic Properties Here and throughout, for a vector denotes its Euclidean norm, and for any square matrix A, A denotes its modulus of the largest singular value of A. To study the rates of convergence for ˆβ, ˆγ, ˆλ and ˆf 0, ˆf 1, we first give a set of regularity conditions. If the estimating equation (4) has multiple solutions, then only a sequence of consistent estimator (ˆθ, ˆγ, ˆρ) is considered in this section. A sequence (ˆθ, ˆγ, ˆρ) is said to be a consistent sequence if (ˆβ, ˆγ, ˆλ ) (β 0, γ 0, λ 0 ) and sup t π (t) ˆα f 0 (t) 0, sup t π (t)ˆ α f 1 (t) 0 in probability as m. The fact that our iterative algorithm starts from consistent estimators of the parameters ensures that the final estimators are also consistent. Our basic conditions are as follows: (A1) The dimensions p, q and d of covariates x ij, w ijk and z ij are fixed; m and max{n i } is bounded, and the distinct values of t ij form a quasi-uniform sequence i that grows dense on [0, 1]. The first four moments of y ij exist. (A2) The sth derivatives of f 0 and f 1 are bounded for some s 2. (A3) The covariates w ijk and the matrices W 1 i are all bounded, which means that all the elements of the vectors and matrices are bounded. The function g 1 ( ) has bounded second derivatives. (A4) The parametric space Θ is a compact subset of R p+q+d, and the parameter value (β 0, γ 0, λ 0 ) is in the interior of the parameter space Θ. The assumptions that the number of independent subjects goes to infinity and that the maximum number of repeated measurements is bounded are standard. They were used by Liang and Zeger (1986) to study the GEE estimates. The existence of the first four moments of the response is needed for consistently estimating the parameters in the variance. The smoothness conditions on f 0 and f 1 given by Condition (A2) 10

12 determine the rate of convergence of the spline estimates. Condition (A3) is satisfied as t is bounded. Assumption (A4) is routinely made in linear models. To study the asymptotic properties of estimators, we assume the dependence between x ij, z ij and t ij as following x ijk = g k (t ij ) + δ ijk, k = 1,, p. (8) z ijl = g l (t ij ) + δ ijl, l = 1,, d; i = 1,, m; j = 1,, n i ; (9) where δ ijk s and δ ijl s are mean zero random variables independent of the corresponding random errors and of one another. Relationships (8) and (9) are commonly made in the spline literature. See He, et al. (2002) and references therein. Indeed, Härdle et al. (2000, Chapter 1.3) considered similar assumptions for partly linear kernel regression. Our assumptions correspond to their random design case. Equations (8) and (9) can be applied when categorical variables are present (Speckman, 1988). For example, when x k is a binary zero one variable denoting treatment assignment, g k (t) = P (x k = 1 t). Independence between x k and covariate t implies that g k is constant. However, a systematic trend resulting in unbalanced allocation to treatment could result in non-constant g k. Let Λ n and Λ n be the n p and n d matrices whose kth column are δ k = (δ 11k,, δ 1n1 k,, δ mnmk) and δ k = ( δ 11k,, δ 1n1 k,, δ mnmk) respectively. We make the following assumption: (A5) (1) EΛ n =0, sup n 1 n E Λ n 2 < ; E Λ n =0, sup n 1 n E Λ n 2 < ; (2) k n M Σ 0 M and k n M W 0 M are nonsingular for sufficiently large n, and the eigenvalues of k n M Σ 0 M/n and k n M W 0 M/n are bounded away from 0 and infinity, where M = (π 1,, π m ), Σ 0 = diag{σ 0 1,, Σ0 m } with Σ0 i = 0iΣ 1 0i 0i = i (η 0 i )Σ 1 0i i(η 0 i ), and W 0 is defined in a similar fashion. We take the number of knots k n as the integer part of n 1/(2s+1), where s is defined 11

13 in (A2) and taken as 2 in this work. For this knot number, Condition (2) of (A5) is expected to hold as this is a property of the B-spline basis functions (He and Shi, 1996). The asymptotic properties of (ˆβ, ˆγ, ˆλ) involve computation of the covariance matrix m = (δ kl m ) k,l=1,2,3 of ( S 1, S 2, S 3 ) / m, where S 1, S 2 and S 3 are defined by S 1 = X i 0i Σ 1 0i (y i µ 0i ), S 2 = Vi 0 D 1 0i (r 0i ˆr 0i ), S 3 = Z i D 0i W 1 0i (ɛ2 0i σ2 0i ). Here X = (I P)X with P = M(M Σ 0 M) 1 M Σ 0 ; r 0i = y i µ 0i, ˆr 0i = (r 0i1,, r 0ij,, r 0ini ) with ˆr 0ij = j 1 r 0ik w ijk γ 0; Vi 0 = (0, r 0i1 w i21,, j 1 k=1 r 0ikw ijk ) ; and k=1 Z = (I P)Z. We make the following assumption similar to Assumption 11 in Ye and Pan (2006). (A6) The covariance matrix m is positive definite, and m = δ11 m δ12 m δ13 m δ 21 m δ22 m δ23 m = δ11 δ 12 δ 13 δ 21 δ 22 δ 23, as m, (11) δ 31 m δ 32 m δ 33 m δ 31 δ 32 δ 33 were is a positive definite matrix. Theorem 1. If (A1) to (A6) hold and k n = O(n 1/(2s+1) ), we have that 1 n 1 n j=1 n i (10) n i { } 2 ˆf0 (t ij ) f 0 (t ij ) = Op (n 2s/(2s+1) ), (12) j=1 where ˆf 0 (t) = π (t) ˆα and ˆf 1 (t) = π (t) α. { } 2 ˆf1 (t ij ) f 1 (t ij ) = Op (n 2s/(2s+1) ), (13) As pointed out by He et al. (2005), (12) and (13) imply that ( ˆf i (t) f i (t)) 2 dt = O p (n 2s/(2s+1) ), i = 0, 1 under some general conditions (Stone, 1985, Lemmas 8 and 9). This is the optimal rate of convergence for estimating f 0 and f 1 under the smoothness condition (A2). For the parametric coefficients, we have the following. 12

14 Theorem 2. Under conditions (A1)-(A6), the generalized estimating equation estimator (ˆβ m, ˆγ m, ˆλ m ) is m-consistent and asymptotically normal, that is ˆβ m β 0 m ˆγ m γ 0 N ˆλ 0, δ δ 22 0 δ11 δ 12 δ 13 δ 21 δ 22 δ 23 δ δ 22 0 m λ δ 33 δ 31 δ 32 δ δ 33 in distribution as m. The asymptotic variance reduces to a diagonal matrix for normally distributed response variable, which is a semiparametric analog to Theorem 2 of Ye and Pan (2006). For inference, we use a robust estimator of the covariance matrix of ˆβ m as follows V (ˆβ m ) = M 1 0 M 1M 1 0, (14) where M 0 = m X i ˆ i ˆΣi ˆ i X i, M 1 = m X i ˆ i ˆΣi (y i ˆµ i )(y i ˆµ i ) ˆ i X i. The estimated covariance matrices of ˆγ m and ˆλ m can be obtained in a similar way, and the covariances δ kl (k l) can also be estimated by their sample versions. 4 Numerical Study Whenever appropriate, we compare our approach with GEE and that in Fan et al. (2007). We use local linear regression to model f 0 (t) in the mean and local constant 1 regression for the marginal variance for Fan et al s method. For the conventional GEE, we use the same spline representation for f 0. All the code is written in R and function geese in R package geepack is used for GEE. For brevity, our approach is referred to as the semiparametric mean-variance method (SMV). The approach by Fan et al. (2007) is referred to as the quasi-likelihood method (QL) which is used for estimating the correlation. 4.1 Real data analysis We apply the proposed estimation method to the CD4 cell study, which was analyzed by many authors (Zeger and Diggle, 1994; Ye and Pan, 2006). This data set comprises 13

15 CD4 cell counts of 369 HIV-infected men with six covariates including time since seroconversion (t ij ), age (relative to arbitrary origin, x ij1 ), packs of cigarettes smoked per day (x ij2 ), recreation drug use (x ij3 ), number of sexual partners (x ij4 ), mental illness score (x ij5 ). Altogether there are 2,376 values of CD4 cell counts, with multiple repeated measurements taken for each individual at different times, covering a period of approximately eight and a half years. The number of measurements for each individual varies from 1 to 12 and the time points are not equally spaced. Thus, the CD4 cell data are highly unbalanced. We use square root transformation on the response by the suggestion in Zeger and Diggle (1994), where further details about the design and the medical implications of the study can be found. To model jointly the mean and covariance structures for the CD4 cell data, we use the following mean model y ij = x ij1 β 1 + x ij2 β 2 + x ij3 β 3 + x ij4 β 4 + x ij5 β 5 + f 0 (t ij ) + e ij. We take covariates for the autoregressive components as w ijk = (1, t ij t ik, (t ij t ik ) 2, (t ij t ik ) 3 ) following the arguments in Ye and Pan (2006), and for the innovation variances as z ij = x ij. The latter specification allows us to examine whether the innovations are dependent on the covariates. Finally the number of knots is taken to be [(2376) 1/5 ] = 7, which is also the optimal number of knots according to the leave-one-subject-out cross validation. Table 1 lists the results for β by our modified Cholesky decomposition method, where a working AR(1) model with δ = 0.2 is used for the innovation. For comparison, we list the conventional GEE method for the partly linear mean model using different working correlations, including independent, AR(1) and exchangeable structures. We also include Fan et al s approach by using AR(1) and ARMA(1,1) for the correlation. The results show that both SMV and QL give estimators with generally smaller standard errors. For our approach, smoking 14

16 and drug use are highly significant variables, while mental illness score is marginally significant. QL identifies smoking and mental illness as significant covariates, and estimates drug use as borderline insignificant. The significance of smoking is missed by GEE using AR(1) covariance structure, while that of drug use is missed by GEE using either AR(1) or exchangeable variance structure. Finally, GEE using the independent working correlation indicates that mental score is not significant at all, which contradicts with the GEE results using other working correlations. For the autoregressive and log innovation parameters, our model yields γ 1 = 0.675, γ 2 = 0.563, γ 3 = 0.170, γ 4 = 0.017; (0.048) (0.033) (0.077) (0.004) λ 1 = 0.003, λ 2 = 0.085, λ 3 = 0.048, λ 4 = 0.006, λ 5 = 0.004, (0.011) (0.045) (0.122) (0.015) (0.005) where the standard errors are in parentheses. All the autoregressive parameters are highly significant. Figure 1 displays the three fitted curves for f 0, φ as a function of the time lag and f 1, when R i (δ) is specified by AR(1) with δ = 0.2. The trajectory of the mean curve is consistent with that in Zeger and Diggle (1994). Figure 1 (B) plots the estimated generalized autoregressive parameters φ against the time lag between measurements in the same subject, which, according to our model, is simply a cubic polynomial. This plot shows that the generalized autoregressive parameters decrease sharply if the time lag is less than two years and then drop slowly when the lag becomes larger. The innovation curve seems to fluctuate around a constant. These observations basically agree with those in Ye and Pan (2006). It is helpful to compare our method with Fan et al s approach in terms of prediction. To this end, we randomly split the data into three parts, each with 369/3=123 subjects. We use the first two parts of this partitioning as the training data set to 15

17 (A) CD4+ Number Time(Years since Seroconversion) (B) (C) Autogression Coef nonpara. part in (log) innov.var Lag Time(Years since Seroconversion) Figure 1: The CD4 cell data. The fitted curves of (A) nonparametric part in mean against time, (B) the generalized autoregressive parameters against lag and (C) the nonparametric part in (log) innovation variances against time based on AR(1) structure with δ = 0.2 and square root CD4 cell numbers. Dashed curves represent asymptotic 95% confidence intervals. 16

18 Table 1: CD4 cell data. The estimates of parameters based on square root CD4 cell numbers, with standard errors in parentheses. SMV Generalized Estimating Equations QL Independence AR(1) Exchangeable AR(1) ARMA(1,1) β (0.030) (0.035) (0.034) (0.032) (0.032) (0.031) β (0.130) (0.184) (0.190) (0.136) (0.152) (0.138) β (0.345) (0.528) (0.350) (0.358) (0.358) (0.329) β (0.038) (0.059) (0.041) (0.043) (0.040) (0.038) β (0.014) (0.021) (0.014) (0.015) (0.014) (0.014) fit the model, and then assess the out-of-sample performance on the testing data set (denoted as TD) which is left out. We use the predictive mean square errors defined as i TD ni j=1 (ŷ ij y ij ) 2 /n TD as the performance measure, where ŷ ij is the fitted response using the training data set only and n TD = i TD n i is the test data sample size. This process is repeated 100 times. The average mean square errors for QL AR(1), QL ARMA(1,1) and our approach are 37.19, 36.98, with standard errors 2.98, 2.93 and 3.40 respectively. Pairwise t-tests of the mean square errors show that these approaches perform similarly. Note that our model is more flexible and potential more powerful because covariates other than t are allowed in modeling the log innovation variances. As suggested by an anonymous reviewer, we can use graphical tools to explore the covariance structure produced by various methods. For a fair comparison, we fit the mean by using the regression spline approximation for f 0 (t), assuming errors are iid. The fitted residuals are then used for covariance modeling. For SMV, since none of λ is significant, we apply SMV with only a nonparametric function for the log innovation. Note that iteration for θ in (5) is no longer needed for this comparison. Define the variogram v(u, t) = E [ {e(t) e(t u)} 2] /2, u 0 as a function of timelag u and observing time t. We then compare the sample variogram, the variogram estimated by QL with ARMA(1,1) structure and variogram estimated by SMV at 17

19 the sample points by plotting v(u, t) versus u and t in Figure 2. The vertical axis is truncated at 40 to accentuate the shape of the smooth estimate. It is seen that the sample variogram increases with time-lag, corresponding to decreasing correlation as observations are separated in time. Both QL and SMV capture this feature nicely. On the other hand, the plot of sample variogram versus observation time indicates that v(u, t) is likely independent of t, which is missed by QL and to a lesser degree by SMV. Note that the enormous random fluctuation of the sample variogram is a typical phenomenon (Diggle, et al. 2002). Empirical Method QL SMV v(u,t) Time Lag (u) v(u,t) Time Lag (u) Time Lag (u) Time (t) Time (t) Time (t) Figure 2: The sample variogram and its estimates by SMV and QL for CD4 data. Individual points represent the variogram at the sampling points and the solid lines are the LOWESS smoothers. 18

20 4.2 Simulation study We conduct extensive numerical studies to assess the finite sample performance of the proposed method. We also test the asymptotic covariance formula in Theorem 2 and compare SMV with QL and the conventional GEE using working correlation. For each study, we simulate one thousand data sets. Study 1. We first consider the following model y ij = x ij1 β 1 + x ij2 β 2 + f 0 (t ij ) + e ij, i = 1,, m; j = 1,, n i, for m = 100. Note that we use ɛ ij to denote the innovation and e ij here for the random error associated with observation j for subject i. We use the sample scheme in Fan et al. (2007) such that the observation times are regularly scheduled but may be randomly missed in practice. More precisely, each individual has a set of scheduled time points {0, 1, 2,..., 12}, and each scheduled time, except time 0, has a 20% probability of being skipped. The actual observation time is a random perturbation of a scheduled time: a uniform [0, 1] random variable is added to a nonskipped scheduled time. This results in different observed time points t ij per subject, and then t ij is transformed onto [0, 1]. Note that the longitudinal observations are highly irregular and some of {n i } are less than the number of the parameters in the same subject. We take x ij1 = t ij + δ ij, where δ ij follows the standard normal distribution and let x ij2 follow a Bernoulli distribution with success probability 0.5. Note that x ij1 is time-varying. For the nonparametric function in the mean, we take f 0 (t) = cos(tπ). The error (e i1,, e ini ) is generated to follow a multivariate normal distribution with mean 0 and covariance Σ i satisfying Φ i Σ i Φ i = D i, where Φ i and D i are described in Section 2.1 with w ijk = (1, t ij t ik ), z ij = x ij, and f 1 (t) = sin(πt). We consider using AR(1) for R i (δ) in W i = A 1/2 i R i (δ)a 1/2 i, the working covariance structure of 19

21 ɛ 2 i. The compound symmetry (exchangeable) structure is also used. Because the results are similar, we omit the details. In each case, the parameter δ, measuring the correlation between ɛ 2 ij = (y ij ŷ ij ) 2 and ɛ 2 ik = (y ik ŷ ik ) 2, takes four different values, δ = 0, 0.2, 0.5, 0.8, so that the effect of misspecification of R i (δ) on estimating β, γ, and λ can be studied. We take two specifications: Case (1) β = (1, 0.5), γ = (0.2, 0.3) and λ = ( 0.5, 0.2) ; Case (2) β = (1, 0), γ = (0.2, 0), and λ = ( 0.5, 0). For each setting, the expected sample size is about The number of the knots is taken to be /5. Numerical experiments show that the results are not very sensitive to the number of the knots. For each simulated data set, we use δ = 0, 0.2, 0.5 and 0.8 to study the robustness of the approach with regard to δ. Table 2 shows that our semiparametric methods literally yield unbiased estimates for the parameters. Additionally, the parameter δ used in the working covariance structure for the innovations has little effect on the estimation of β, γ and λ, and the estimated mean square errors for f 0 and f 1, when the structure for R i (δ) is based on AR(1). These results imply that SMV is robust against misspecification of the structure of R i (δ). For Case (1), Figure 3 displays the true and fitted curves for nonparametric function f 0 and f 1 when R i (δ) is specified by AR(1) with δ = 0.2. The three curves ˆf 5, ˆf 50 and ˆf 95 represent the fits which are 5%, 50% and 95% best in terms of the mean squared errors in 1000 runs, respectively. They show a close agreement with the true functions. To check the robustness of the choice of W, we have also followed Ye and Pan (2006) by generating longitudinal data sets from normal mixture distributions. The results are robust to this deviation of the normality, and are qualitatively similar to those in Ye and Pan (2006). The detailed results can be found in the online supplemental materials. Hereafter, we use AR(1) with δ = 0.2 for R i (δ) in the following simulation when- 20

22 Table 2: Simulation results for Study 1 over 1000 replications. Presented are the sample means with sample standard errors in parentheses. MSE( ˆf i ), i = 0, 1 is the mean square error for the estimate f i over all time points in the data. True δ = 0 δ = 0.2 δ = 0.5 δ = 0.8 β (0.0276) 1.00 (0.0276) 1.00 (0.0276) 1.00 (0.0277) β (0.0579) 0.50 (0.0579) 0.50 (0.0579) 0.50 (0.0579) γ (0.0174) 0.20 (0.0174) 0.20 (0.0174) 0.20 (0.0174) γ (0.0525) 0.30 (0.0525) 0.30 (0.0526) 0.30 (0.0527) λ (0.0435) 0.50 (0.0450) 0.50 (0.0502) 0.50 (0.0542) λ (0.0931) 0.20 (0.0974) 0.20 (0.1082) 0.20 (0.1161) MSE( ˆf 0 ) (0.0371) (0.0371) (0.0371) (0.0371) MSE( ˆf 1 ) (0.0081) (0.0084) (0.0097) (0.0130) β (0.0308) 1.00 (0.0308) 1.00 (0.0308) 1.00 (0.0308) β (0.0615) 0.00 (0.0616) 0.00 (0.0616) 0.00 (0.0618) γ (0.0181) 0.20 (0.0181) 0.20 (0.0181) 0.20 (0.0182) γ (0.0487) 0.00 (0.0487) 0.00 (0.0487) 0.00 (0.0488) λ (0.0453) 0.50 (0.0465) 0.50 (0.0514) 0.50 (0.0551) λ (0.0873) 0.00 (0.0909) 0.00 (0.1017) 0.00 (0.1098) MSE( ˆf 0 ) (0.0230) (0.0229) (0.0230) (0.0230) MSE( ˆf 1 ) (0.0084) (0.0086) (0.0010) (0.0137) ever our approach is used. Study 2. This example is used to illustrate the performance of the asymptotic covariance formula in Theorem 2, following the simulation setup Case (1) in Study 1. In Table 3, SD represents the sample standard deviation of 1,000 estimates of β which can be viewed as the true standard deviation of the resulting estimates. SE represents the sample average of 1,000 estimated standard errors using formula (14), and Std represents the standard deviation of these 1,000 standard errors. Table 3 demonstrates that the standard error formula works well for different AR(1) correlation structures. Table 3: Assessment of the standard errors using formula (14). δ = 0 δ = 0.2 δ = 0.5 δ = 0.8 β 1 SD SE (Std) (0.0027) (0.0027) (0.0027) (0.0028) β 2 SD SE (Std) (0.0047) (0.0047) (0.0047) (0.0048) 21

23 (A) (B) f 0(t) f 0 f^5 f^50 f^95 f 1(t) f 1 f^5 f^50 f^ Time Time Figure 3: Nonparametric function f 0 and f 1 and their fitted curves ˆf 5, ˆf 50, ˆf 95, for AR(1) structure with δ = 0.2. Study 3. In this example, we assess the effect of misspecification of the working covariance structure Σ i on the estimation of β and f 0. For comparison, we apply GEE with independent, exchangeable and AR(1) working correlation. The results are summarized in Table 4. Not surprisingly, all methods give almost unbiased estimates for β. However, the standard errors of our semiparametric method is much smaller than those of the other methods, implying that our estimator is more efficient. Furthermore, for the nonparametric part f 0 in the mean, our semiparametric approach gives estimates with significantly smaller mean square errors. Taken together, the study shows that the SMV approach is more accurate in estimating the mean. Table 4: Simulation results for Study 3 with standard errors in parentheses. SMV Generalized Estimating Equations True Independence Exchangeable AR(1) β ( ) 1.00 ( ) 1.00 ( ) 1.00 ( ) β ( ) 0.50 ( ) 0.51 ( ) 0.50 ( ) MSE(f 0 ) (0.0371) (0.1328) (0.1196) (0.1217) 22

24 Study 4. We use this example to compare SMV and QL for estimating the covariance matrix. The mean model, taken from Example 5.1 in Fan and Wu (2008), has a parametric form as y ij = β 1 x ij1 + β 2 x ij2 + β 3 x ij3 + e ij, where β = (β 1, β 2, β 3 ) = (2, 1.5, 3), x ijk marginally follows standard normal distribution and the correlation between the ith and the jth covariate in x is 0.5 i j. We use a linear model for the mean since the main difference between SMV and QL lies in covariance estimation. We consider two scenarios for generating the covariance matrix, first of which favors QL and the second SMV. Especially, for QL, we take h 0 (t) = 0.5 exp(t/12+sin(t/12)) as the marginal variance function and the true correlation matrix as AR(1) with parameter a = 0, 0.3, 0.6 or 0.9. For SMV, we take h 0 as the innovation function and w ijk = (1, t ij t ik ) as in Study 1, but fix γ 1 = 0 while allowing γ 2 to be 0.3, 0.6 or 0.9. Note that the numerical values of γ 2 are not directly comparable to the values of a, because one is defined on the autoregressive structure and the other is on the correlation. We remark that when a = 0, both the variance-correlation and the modified Cholesky decompositions are appropriate. The observation times are generated according to Study 1 and we take m = 200 as in Fan and Wu (2008). For QL, we use the AR(1) working correlation structure. Thus, if the data generating process favors QL, we would fit a correctly specified model using QL but a misspecified model using SMV. Since both approaches give literally unbiased estimates for β, here we only report the sample standard errors of the estimated β under these two methods, together with those y using GEE with working independence (GEE). To compare accuracy in estimating the covariance matrix, we use the entropy loss L(Σ, ˆΣ) = m 1 m {trace(σ i ˆΣ 1 i ) log Σ ˆΣ 1 i i n i }, where Σ i is the true covariance matrix and ˆΣ i is its estimator. The results are summarized in Table 5. When the model 23

25 favors Fan et al. (i.e. a 0), QL gives more accurate estimates of β and the covariance structure, judged by the smaller sample standard deviations of β and the smaller means of the entropy loss. On the other hand, SMV performs better if the data is generated from the second scenario (i.e. γ 2 0). Both SMV and QL ourperform the GEE estimator with working independence, even for misspecified models. Interestingly, when both approaches are appropriate (a = 0), our approach performs better with smaller standard errors and entropy losses. This may be due to the fact that in estimating the marginal variance, Fan et al. (2007) used local constant estimate. Table 5: Simulation results comparing SMV and QL. For β, the sample standard errors (multplied by a factor of 1000) are reported, while for L(Σ, ˆΣ), both the mean and the sample standard errors (in parentheses), multplied by a factor of 100, are reported. β 1 β 2 β 2 L(Σ, ˆΣ) QL SMV GEE QL SMV GEE QL SMV GEE QL SMV a = (7.6) 11.7 (2.0) a = (1.8) 96.6 (4.2) a = (2.2) (6.2) a = (5.3) (9.9) γ 2 = (11.0) 11.8 (2.1) γ 2 = (9.9) 11.6 (2.1) γ 2 = (24.1) 12.8 (2.0) 5 Discussion We have proposed semiparametric mean-covariance models for longitudinal data analysis using the modified Cholesky decomposition. This decomposition is adopted such that partly linear regression models can be applied to the autoregressive coefficients and log innovation variances. On the one hand, our approach extends the semiparametric model for the mean in longitudinal analysis. On the other hand, our approach relaxes the parametric assumption made by Ye and Pan (2006). We have provided theoretical justification for the proposed approach and compared our method with the variance-correlation decomposition of Fan et al. (2007) and the conventional GEE. 24

26 In practice, the true underlying covariance structure is unknown. It is important to develop tools to check whether the imposed covariance decomposition is reasonable. We have used variogram for this purpose in CD4 data analysis. More work needs to be done in this direction. Our results are valid under the usual assumptions that the number of observations m goes to infinity while the number of repeated measurements n i is bounded. When m or max{n i } or both go to infinity, the asymptotic results in Xie and Yang (2003) may be extended to our setup as the model in (4) can be seen as a semiparametric generalization of the GEE. However, our setup is more challenging due to simultaneous models for the mean and the covariance. Although limited simulation (not shown) indicates that our method still works, a more rigorous study deserves more attention. In this work, we only consider the classical setup when x are z are finite dimensional. For a diverging number of covariates in the mean, Lam and Fan (2008) studied the profile likelihood estimator. For our semiparametric approach, it will be interesting to investigate the statistical properties with diverging numbers of parameters both in the mean and the variance. Finally, it would be interesting to extend the semiparametric approach to nonnormal longitudinal data analysis. Appendix: sketch of proofs Proofs of Lemma 2 and Theorem 2 are available in the supplemental materials. The following lemma is stated for easy reference (Schumaker, 1981, Theorem 12.7). Lemma 1. Under Assumptions (A1)-(A2), there exists constants C 0 and C 1 such that sup t [0,1] f 0 (t) π (t)α 0 C 0 k s n, and sup t [0,1] f 1 (t) π (t) α 0 C 1 k n s. Proof of Theorem 1. Equation (12) can be obtained directly from He et al (2005). 25

27 Here we only give a proof of equation (13) when all W i are known, denoted by W 0i. Similar asymptotic results hold when all W 0i are replaced by consistent estimates. Let ( 1/2 A T m = m Am 1/2 0 k 1/2 n H W 0 M(M W 0 M) 1 where A m = H W 0 H = m H i Wi 0H i, Q2 m = k nm W 0 M. Obviously, condition (A6) implies that A m /m A > 0 in probability for some positive matrix A as m. From the definition of T m, it is easy to know that T m m H i D 0iW 1 0i D 0iH i T m = I d+k, where I d+k is (d + K) (d + K) identity matrix. We further denote ( ) ( ζ1 ζ(ρ) = ζ = (T 2 m) 1 Am 1/2 (λ λ (ρ ρ 0 ) = 0 ) kn 1/2 Q m ( α α 0 ) + kn 1/2 Q 1 m M W 0 H(λ λ 0 ) and Q 1 m ), ˆζ = ( ˆζ1 ˆζ 2 ) = ζ(ˆλ, α). (A.1) From Lemma 1, it is easy to know that for sufficiently large n, 1 n n i { } 2 ˆf1 (t ij ) f 1 (t ij ) 2 n j=1 and Am 1/2 (ˆλ λ 0 ) ˆζ, [ 1 n n i (π ij ( α α 0 )) 2 + 2C 0 k 2s n i ] 1/2 (π ij( α α 0 )) 2 = n 1/2 M( α α 0 ) Cn 1/2 kn 1/2 Q n ( α α 0 ) j=1 Cn 1/2 ˆζ + Cλ 1/2 n ˆλ λ 0 sup n 1 a M W 0 Hbkn 1/2, a =1, b =1 where λ n is the minimum eigenvalue of k n M W 0 M/n. Then by Lemma 6.2 of He and Shi (1996) it suffices to show that ˆζ = O p (k 1/2 n ). To do so, let R mi = π i α 0 f 1 (t i ), η 0 i = H i λ 0 +f 1 (t i ), and ς i = H i ζ +R mi, where H i = H i T m = (H i A 1/2 m, π i Q 1 m kn 1/2 ). Then it s easy to see that H i ζ = η 0 i + ς i, σ 2 i = exp(η0 i + ς i), and the third estimating equation of (4) can be rewritten as j=1 n, ) S ζ (ζ) = H id i (η 0 i + ς i )W 1 0i (ɛ2 i exp(η 0 i + ς i )) = (A.2)

28 Multiply T m to equation (A.2) and we get Ψ(ζ) = T m S ζ (ζ) = H i D i(η 0 i + ς i)w 1 0i (ɛ2 i exp(η0 i + ς i)) = 0. (A.3) It is easy to known that both (A.2) and (A.3) give the same root for ζ by conditions (A4) and (A5). Let a R d+ K such that a a = 1. We expand a Ψ(ζ) in a Taylor series, a Ψ(ζ) = = + a H i D i (η 0 i + ς i)w 1 0i (ɛ2 i exp(η0 i + ς i)) a H i D 0i W 1 0i (ɛ2 i σ 2 0i) a H i D 0i W 1 0i D 0iς i ς a H i D i i W ς i ςi 1 0i (ɛ2 i σ2 0i ) + R m (ς ), =0 (A.4) where R m (ς ) = m R mi (ς i ) and R mi (ς i ) = 1 2 ς i [ 2 a H i D i W 1 0i (ɛ2 i σ2 i )/ ς i ς i ς i =ς i ]ς i for ς i = η 0 i +τ i ς i (i = 1,, m) with 0 < τ i < 1. Further, let Φ(ζ) = m H id 0i W 1 0i (ɛ2 0i σ 2 0i ) ζ, where ɛ 0i = y i µ 0i. Denote the solution of Φ as ζ, that is, ζ = ( ζ1 ζ 2 ) = H id 0i W 1 0i (ɛ2 0i σ 2 0i). (A.5) From these two expressions the difference between a Ψ(ζ) and a Φ(ζ) can be expressed as a (Ψ(ζ) Φ(ζ)) = + a H i D 0i W 1 0i (ɛ2 i ɛ 2 0i) ς a H i D i i W ς i ςi 1 0i (ɛ2 i σ2 0i ) + R m (ς ) =0 =: I n0 I n1 + I n2 (ζ) + R m (ς ). a H i D 0i W 1 0i D 0iR mi By Cauchy-Schwarz inequality, the definition of H and k n = O(n 1/(2s+1) ), we have E(I n0 ) 2 a H i D 0i W 1 0i D 0i H i a 27 E(ɛ 2 i ɛ 2 0i) W 1 0i (ɛ2 i ɛ 2 0i)

29 = = C E( D( µ i )ɛ 0i + ( µ i ) 2 ) W 1 0i ( D( µ i )ɛ 0i + ( µ i ) 2 ) trace{ D( µ i )W 1 0i [ µ i 2 + D( µ i )Σ 0i } + n i [ µ ij ] 4 = O p (k n ), j=1 {( µ i ) 2 } W 1 0i ( µ i) 2 (A.6) where µ i = (µ i1 µ 0i1,, µ ini µ 0ini ) and D( µ i ) = diag{µ i1 µ 0i1,, µ ini µ 0ini } with µ i = µ(b iˆθ) = g 1 (B iˆθ). The last inequality in (A.6) can be obtained easily by He et al (2005). Thus I n0 = O p (kn 1/2 ). For I n1, Obviously, I n1 = a H i D 0i W 1 0i D 0iR mi = a HW 0 R m = {a H W 0 Ha} 1/2 {R mσ 0 R m } 1/2 = O(n 1/2 k s n ) = O(k 1/2 n ), where R m = (R m1,, R mnm ). For I n2 (ζ), write I n2 (ζ) = ζ H i G 0,i + R mi G 0,i =: I (1) (2) n2 (ζ) + I n2, where G 0,i = a H i D i ς i W ςi 1 0i (ɛ2 i σ2 0i ). Let ẽ i = W 1 0i (ɛ2 i σ2 0i ) = (ẽ i1,, ṽe ini ). =0 It is easy to know that G 0i = diag{σ 2 0i1ẽ1i,, σ 2 0in i ẽ ini } H i a =: A(ẽ i ) H i a. Then by Cauchy-Schwarz inequality, we have ( I (1) n2 ) 2 = ( m 2 ζ H i A(ẽ i ) H i a) = ξ 2 d k=1 ( d m ξ k k=1 1 k H i A(ẽ i) H i a ( m ) 2 d ( m ) 2 1 H k ia(ẽ i ) H i a ξ 2 1 H k ia(ẽ i ) H i 1 j, k,j=1 )2 where d = d + K and 1 k = (0,, 0, 1, 0,, 0) is a d vector with 1 as its kth element and 0 elsewhere. By conditions (A1),(A3), (A5) and (A6), we have E(I (1) n2 ) 2 C ζ 2 d k,j=1 E (1 k H ia(ẽ i ) H ) 2 i 1 j 28

30 C ζ 2 d C ζ 2 sup i = C ζ 2 sup i k,j=1 d 1 H k H i i 1 k E A(ẽ i ) H i 1 j 2 k=1 1 k H i H i 1 k m trace{ H i H i }trace{ d k=1 1 k H i H i 1 k O(k n ) H i H i }O(k n ) C ζ 2 k n sup trace{h i A 1 m H i + k n π i Q 2 m π i }O(k n) i = O(k 3 n ζ 2 /n), where the constant C, independent of n, may vary from line to line. Therefore, for sufficiently large L, we have sup 1/2 ζ Lk n, a a=1 I(1) n2 (ζ) = O p(n 1/2 kn 2 ). Similarly, we can show that sup I (2) n2 = O(kn 1 s ). Combining these two results and noting that a a=1 k n = O(n 1/(2s+1) ), we obtain sup I n2 = O p (kn 1/2 ). a a=1 For Rm (ς ), let F i = ( 2 a H i D i W 1 0i (ɛ2 i σ2 i )/ ς i ς i ) ς i = ς i. We see that R m(ς ) = 1 2 ζ H i F H i i ξ + R mif H i i ζ = I (1) n3 (ζ) + I (2) n3 (ζ) + I (3) n3 (ζ). R mif i R mi By (A3), (A5) and (A6), we have that sup a a=1 F i = O p(n 1/2 kn 1/2 ). Hence sup I (1) n3 (ξ) = O p (n 1/2 kn 5/2 ), sup ζ Lkn 1/2,a a=1 ζ Lk 1/2 n,a a=1 sup I (3) n3 (ξ) = O p (n 1/2 kn 1/2 2s ), sup vζ Lkn 1/2,a a=1 ζ Lk 1/2 n,a a=1 I (2) n3 (ξ) = O p (k 3/2 s n ), R n(ς ) = O p (k 1/2 n ). Putting all the approximations together, we have sup Ψ(ζ) Φ(ζ) = O p (kn 1/2 ), ζ Lkn 1/2 and for sufficient large C, direct calculations give E ζ 2 = E[(ɛ 2 0i σ2 0i ) W 1 0i D 0iH i A 1 m H i D 0i W 1 0i (ɛ2 0i σ2 0i ) +k n (ɛ 2 0i σ 2 0i) W 1 0i D 0iπ i Q 2 m π id 0i W 1 0i (ɛ2 0i σ 2 0i)] Ctrace{H A 1 m H + k n MQ 2 m M } = O(k n ). 29

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association

More information

SEMIPARAMETRIC LONGITUDINAL MODEL WITH IRREGULAR TIME AUTOREGRESSIVE ERROR PROCESS

SEMIPARAMETRIC LONGITUDINAL MODEL WITH IRREGULAR TIME AUTOREGRESSIVE ERROR PROCESS Statistica Sinica 25 (2015), 507-527 doi:http://dx.doi.org/10.5705/ss.2013.073 SEMIPARAMETRIC LOGITUDIAL MODEL WITH IRREGULAR TIME AUTOREGRESSIVE ERROR PROCESS Yang Bai 1,2, Jian Huang 3, Rui Li 1 and

More information

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently

More information

Function of Longitudinal Data

Function of Longitudinal Data New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li Abstract This paper develops a new estimation of nonparametric regression functions for

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data ew Local Estimation Procedure for onparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li The Pennsylvania State University Technical Report Series #0-03 College of Health and Human

More information

Cross-fitting and fast remainder rates for semiparametric estimation

Cross-fitting and fast remainder rates for semiparametric estimation Cross-fitting and fast remainder rates for semiparametric estimation Whitney K. Newey James M. Robins The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP41/17 Cross-Fitting

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE Biostatistics Workshop 2008 Longitudinal Data Analysis Session 4 GARRETT FITZMAURICE Harvard University 1 LINEAR MIXED EFFECTS MODELS Motivating Example: Influence of Menarche on Changes in Body Fat Prospective

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Matrix Factorizations

Matrix Factorizations 1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular

More information

Penalized Empirical Likelihood and Growing Dimensional General Estimating Equations

Penalized Empirical Likelihood and Growing Dimensional General Estimating Equations 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (0000), 00, 0, pp. 1 15 C 0000 Biometrika Trust Printed

More information

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori On Properties of QIC in Generalized Estimating Equations Shinpei Imori Graduate School of Engineering Science, Osaka University 1-3 Machikaneyama-cho, Toyonaka, Osaka 560-8531, Japan E-mail: imori.stat@gmail.com

More information

Generalized Estimating Equations (gee) for glm type data

Generalized Estimating Equations (gee) for glm type data Generalized Estimating Equations (gee) for glm type data Søren Højsgaard mailto:sorenh@agrsci.dk Biometry Research Unit Danish Institute of Agricultural Sciences January 23, 2006 Printed: January 23, 2006

More information

A Cautionary Note on Generalized Linear Models for Covariance of Unbalanced Longitudinal Data

A Cautionary Note on Generalized Linear Models for Covariance of Unbalanced Longitudinal Data A Cautionary Note on Generalized Linear Models for Covariance of Unbalanced Longitudinal Data Jianhua Z. Huang a, Min Chen b, Mehdi Maadooliat a, Mohsen Pourahmadi a a Department of Statistics, Texas A&M

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th DISCUSSION OF THE PAPER BY LIN AND YING Xihong Lin and Raymond J. Carroll Λ July 21, 2000 Λ Xihong Lin (xlin@sph.umich.edu) is Associate Professor, Department ofbiostatistics, University of Michigan, Ann

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate

More information

GEE for Longitudinal Data - Chapter 8

GEE for Longitudinal Data - Chapter 8 GEE for Longitudinal Data - Chapter 8 GEE: generalized estimating equations (Liang & Zeger, 1986; Zeger & Liang, 1986) extension of GLM to longitudinal data analysis using quasi-likelihood estimation method

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

Statistical Inference of Covariate-Adjusted Randomized Experiments

Statistical Inference of Covariate-Adjusted Randomized Experiments 1 Statistical Inference of Covariate-Adjusted Randomized Experiments Feifang Hu Department of Statistics George Washington University Joint research with Wei Ma, Yichen Qin and Yang Li Email: feifang@gwu.edu

More information

Modelling Mean-Covariance Structures in the Growth Curve Models

Modelling Mean-Covariance Structures in the Growth Curve Models Modelling Mean-Covariance Structures in the Growth Curve Models Jianxin Pan and Dietrich von Rosen Research Report Centre of Biostochastics Swedish University of Report 004:4 Agricultural Sciences ISSN

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Adaptive Piecewise Polynomial Estimation via Trend Filtering

Adaptive Piecewise Polynomial Estimation via Trend Filtering Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

Testing Error Correction in Panel data

Testing Error Correction in Panel data University of Vienna, Dept. of Economics Master in Economics Vienna 2010 The Model (1) Westerlund (2007) consider the following DGP: y it = φ 1i + φ 2i t + z it (1) x it = x it 1 + υ it (2) where the stochastic

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Semiparametric estimation of covariance matrices for longitudinal data

Semiparametric estimation of covariance matrices for longitudinal data Semiparametric estimation of covariance matrices for longitudinal data Jianqing Fan and Yichao Wu Princeton University and North Carolina State University Abstract Estimation of longitudinal data covariance

More information

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel LECTURE NOTES on ELEMENTARY NUMERICAL METHODS Eusebius Doedel TABLE OF CONTENTS Vector and Matrix Norms 1 Banach Lemma 20 The Numerical Solution of Linear Systems 25 Gauss Elimination 25 Operation Count

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Minimax design criterion for fractional factorial designs

Minimax design criterion for fractional factorial designs Ann Inst Stat Math 205 67:673 685 DOI 0.007/s0463-04-0470-0 Minimax design criterion for fractional factorial designs Yue Yin Julie Zhou Received: 2 November 203 / Revised: 5 March 204 / Published online:

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22 GLS and FGLS Econ 671 Purdue University Justin L. Tobias (Purdue) GLS and FGLS 1 / 22 In this lecture we continue to discuss properties associated with the GLS estimator. In addition we discuss the practical

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2 Rejoinder Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2 1 School of Statistics, University of Minnesota 2 LPMC and Department of Statistics, Nankai University, China We thank the editor Professor David

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software December 2017, Volume 82, Issue 9. doi: 10.18637/jss.v082.i09 jmcm: An R Package for Joint Mean-Covariance Modeling of Longitudinal Data Jianxin Pan The University of

More information

TESTING SERIAL CORRELATION IN SEMIPARAMETRIC VARYING COEFFICIENT PARTIALLY LINEAR ERRORS-IN-VARIABLES MODEL

TESTING SERIAL CORRELATION IN SEMIPARAMETRIC VARYING COEFFICIENT PARTIALLY LINEAR ERRORS-IN-VARIABLES MODEL Jrl Syst Sci & Complexity (2009) 22: 483 494 TESTIG SERIAL CORRELATIO I SEMIPARAMETRIC VARYIG COEFFICIET PARTIALLY LIEAR ERRORS-I-VARIABLES MODEL Xuemei HU Feng LIU Zhizhong WAG Received: 19 September

More information

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices arxiv:1308.3416v1 [stat.me] 15 Aug 2013 Yixin Fang 1, Binhuan Wang 1, and Yang Feng 2 1 New York University and 2 Columbia

More information

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53 State-space Model Eduardo Rossi University of Pavia November 2014 Rossi State-space Model Fin. Econometrics - 2014 1 / 53 Outline 1 Motivation 2 Introduction 3 The Kalman filter 4 Forecast errors 5 State

More information

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model Applied and Computational Mathematics 2014; 3(5): 268-272 Published online November 10, 2014 (http://www.sciencepublishinggroup.com/j/acm) doi: 10.11648/j.acm.20140305.22 ISSN: 2328-5605 (Print); ISSN:

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017 Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION By Degui Li, Peter C. B. Phillips, and Jiti Gao September 017 COWLES FOUNDATION DISCUSSION PAPER NO.

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

POLI 8501 Introduction to Maximum Likelihood Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,

More information

ATINER's Conference Paper Series STA

ATINER's Conference Paper Series STA ATINER CONFERENCE PAPER SERIES No: LNG2014-1176 Athens Institute for Education and Research ATINER ATINER's Conference Paper Series STA2014-1255 Parametric versus Semi-parametric Mixed Models for Panel

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

Estimation of the Conditional Variance in Paired Experiments

Estimation of the Conditional Variance in Paired Experiments Estimation of the Conditional Variance in Paired Experiments Alberto Abadie & Guido W. Imbens Harvard University and BER June 008 Abstract In paired randomized experiments units are grouped in pairs, often

More information

Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor

Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor Y. Wang M. J. Daniels wang.yanpin@scrippshealth.org mjdaniels@austin.utexas.edu

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Analysis of Longitudinal Data with Semiparametric Estimation of Covariance Function

Analysis of Longitudinal Data with Semiparametric Estimation of Covariance Function Analysis of Longitudinal Data with Semiparametric Estimation of Covariance Function Jianqing Fan, Tao Huang and Runze Li Abstract Improving efficiency for regression coefficients and predicting trajectories

More information

Modelling Survival Events with Longitudinal Data Measured with Error

Modelling Survival Events with Longitudinal Data Measured with Error Modelling Survival Events with Longitudinal Data Measured with Error Hongsheng Dai, Jianxin Pan & Yanchun Bao First version: 14 December 29 Research Report No. 16, 29, Probability and Statistics Group

More information

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Marie Davidian North Carolina State University davidian@stat.ncsu.edu www.stat.ncsu.edu/ davidian Joint work with A. Tsiatis,

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 12 1 / 34 Correlated data multivariate observations clustered data repeated measurement

More information

STAT 526 Advanced Statistical Methodology

STAT 526 Advanced Statistical Methodology STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 10 Analyzing Clustered/Repeated Categorical Data 0-0 Outline Clustered/Repeated Categorical Data Generalized Linear Mixed Models Generalized

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1

if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1 Count models A classical, theoretical argument for the Poisson distribution is the approximation Binom(n, p) Pois(λ) for large n and small p and λ = np. This can be extended considerably to n approx Z

More information

A Measure of Robustness to Misspecification

A Measure of Robustness to Misspecification A Measure of Robustness to Misspecification Susan Athey Guido W. Imbens December 2014 Graduate School of Business, Stanford University, and NBER. Electronic correspondence: athey@stanford.edu. Graduate

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics

More information

Modeling Repeated Functional Observations

Modeling Repeated Functional Observations Modeling Repeated Functional Observations Kehui Chen Department of Statistics, University of Pittsburgh, Hans-Georg Müller Department of Statistics, University of California, Davis Supplemental Material

More information

ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS

ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS Statistica Sinica 13(2003), 1201-1210 ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS Liqun Wang University of Manitoba Abstract: This paper studies a minimum distance moment estimator for

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

The Hilbert Space of Random Variables

The Hilbert Space of Random Variables The Hilbert Space of Random Variables Electrical Engineering 126 (UC Berkeley) Spring 2018 1 Outline Fix a probability space and consider the set H := {X : X is a real-valued random variable with E[X 2

More information

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling

Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling J. Shults a a Department of Biostatistics, University of Pennsylvania, PA 19104, USA (v4.0 released January 2015)

More information

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Motivational Example

Motivational Example Motivational Example Data: Observational longitudinal study of obesity from birth to adulthood. Overall Goal: Build age-, gender-, height-specific growth charts (under 3 year) to diagnose growth abnomalities.

More information

1.1 Basis of Statistical Decision Theory

1.1 Basis of Statistical Decision Theory ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of

More information

Generalized Linear Models with Functional Predictors

Generalized Linear Models with Functional Predictors Generalized Linear Models with Functional Predictors GARETH M. JAMES Marshall School of Business, University of Southern California Abstract In this paper we present a technique for extending generalized

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information