Semiparametric Regression for Clustered Data Using Generalized Estimating Equations

Size: px
Start display at page:

Download "Semiparametric Regression for Clustered Data Using Generalized Estimating Equations"

Transcription

1 Semiarametric Regression for Clustered Data Using Generalized Estimating Equations Xihong Lin Raymond J. Carroll We consider estimation in a semiarametric generalized linear model for clustered data using estimating equations. Our results aly to the case where the number of observations er cluster is nite, whereas the number of clusters is large. The mean of the outcome variable Œ is of the form g4œ5 D X T C ˆ4T 5, where g4 5 is a link function, X T are covariates, is an unknown arameter vector, ˆ4t5 is an unknown smooth function. Kernel estimating equations roosed reviously in the literature are used to estimate the in nitedimensional nonarametric function ˆ4t5, a ro le-based estimating equation is used to estimate the nite-dimensional arameter vector. We show that for clustered data, this conventional ro le-kernel method often fails to yield a n-consistent estimator of along with aroriate inference unless working indeendence is assumed or ˆ4t5 is arti cially undersmoothed, in which case asymtotic inference is ossible. To gain insight into these results, we derive the semiarametric ef cient score of, which is found to have a comlicated form, show that, unlike for indeendent data, the ro le-kernel method does not yield a score function asymtotically equivalent to the semiarametric ef cient score of, even when the true correlation is assumed ˆ4t5 is undersmoothed. We illustrate the methods with an alication to infectious disease data evaluate their nite-samle erformance through a simulation study. KEY WORDS: Asymtotics; Clustered data; Consistency; Ef ciency; Generalized estimating equations; Kernel method; Longitudinal data; Nonarametric regression; Partially linear model; Pro le method; Swich estimator; Semiarametric ef cient score; Semiarametric ef ciency bound. 1. INTRODUCTION Clustered data arise in many elds of biomedical research, including longitudinal studies, intervention studies, clinical trials. Parametric regression using generalized estimating equations (GEEs) (Liang Zeger 1986) has become a oular ractice for analyzing such data. It is well understood that the GEE estimators of regression coef cients are consistent when the mean function is correctly seci ed even when the within-cluster correlation structure is misseci ed, that the most ef cient estimator is obtained by correctly secifying the within-cluster correlation. To allow for more exible deendence of an outcome variable on covariates, there has been substantial recent interest in modeling covariate effects nonarametrically (Lin Carroll 000, Hoover, Rice, Wu, Yang 1998; Wild Yee 1996). Lin Carroll (000) showed that in contrast to arametric GEEs, when stard kernel methods are used, tyically the most ef cient estimator of the nonarametric function is obtained by comletely ignoring the within-cluster correlation; correct seci cation of the correlation structure generally results in an asymtotically less ef cient estimator. In many instances, a semiarametric artially generalized linear regression model is more desirable than modeling every covariate effect nonarametrically. This model assumes that the mean of the outcome variable Œ deends on some covariates X arametrically on some other covariate T nonarametrically in the form g4œ5 D X T C ˆ4T 5, where g4 5 is a link function, is an unknown arameter vector, ˆ4 5 is an unknown smooth function. This model seci cation is articularly aealing when the effects of X (e.g., treatment) Xihong Lin is Associate Professor, Deartment of Biostatistics, University of Michigan, Ann Arbor, M I ( xlin@sh.umich.edu). Her research was suorted by National Cancer Institute grant CA Raymond J. Carroll is Distinguished Professor, Deartments of Statistics Biostatistics Eidemiology, Texas A&M University, College Station TX ( carroll@stat.tamu.edu). His research was suorted by National Cancer Institute grant CA 57030, by the Texas A&M Center for Environmental Rural Health via National Institute of Environmental Health Sciences grant P30-ES The authors thank the editor, the associate editor, two referees for their helful comments suggestions are of maor interest the effects of T (e.g., confounders) are nuisance. This is because one can make inference on the effects of X while making minimal assumtions on the effects of T using a fully nonarametric function. One examle is the longitudinal infectious disease study considered in Section 8. This study involved 75 reschoolage children who were reexamined every 3 months for 18 months for the resence of resiratory infection (yes/no) (Diggle, Liang, Zeger 1994). The rimary interest is to study the association between resiratory infection vitamin A de ciency (yes/no), while accounting for several confounders including age. Examination of the distribution of the vertical strokes in Figure 3 suggests that the age effect dearts dramatically from linearity; the vertical strokes indicate the ages for yes (to) no (bottom). Because the binary exosure of vitamin A de ciency is of main interest the age effect is nuisance, we are interested in modeling the vitamin A de ciency effect while allowing the nuisance age effect to be modeled nonarametrically. Several authors have considered such semiarametric regression models. A key challenge of estimation in this model is that it is comosed of a nite-dimensional arameter vector an in nite dimensional arameter ˆ4 5. Estimation for indeendent nonclustered data has been considered by Carroll, Fan, Gbels, W (1997), Hastie Tibshirani (1990), Severini Staniswalis (1994). These authors used the kernel method to estimate ˆ4t5 the ro le likelihood based method to estimate. They showed that the estimator of is n consistent semiarametric ef cient (Bickel, Klaassen, Ritov, Wellner 1993). For longitudinal data, Zeger Diggle (1994) considered a semiarametric model with a nonarametric time traectory arametric covariate effects. They estimated ˆ4t5 using a kernel method by ignoring the within-cluster correlation, estimated using weighted least squares by accounting for the within-cluster 001 American Statistical Association Journal of the American Statistical Association Setember 001, Vol. 96, No. 455, Theory Methods

2 1046 Journal of the American Statistical Association, Setember 001 correlation. They did not study the asymtotic roerties of their method. Severini Staniswalis (1994) extended their indeendent data results to clustered data using ro le-kernel GEEs. They claimed that the estimator of is n consistent for any working correlation matrix seci cation. Zhang, Lin Raz, Sowers (1998) considered a semiarametric linear mixed model estimated the nonarametric function using a smoothing sline. In this article we consider a marginal semiarametric regression model for clustered data with ˆ4t5 estimated using kernel estimating equations estimated using ro le-based estimating equations. Our estimating equations are similar to those of Severini Staniswalis (1994) excet that different working correlation matrices are allowed in the two sets of estimating equations, local linear regression is used instead of local average kernel regression. The main focus of this article is to investigate whether it is ossible to construct a n-consistent ef cient estimator of using the ro lekernel method. This work is motivated by our observation of the diametrically oosed asymtotic roerties of arametric certain nonarametric GEEs in terms of how to obtain the most ef cient estimators, the former requiring correctly secifying the correlation the latter requiring comletely ignoring the correlation. Hence we are interested in investigating whether such different asymtotic behavior affects consistency ef ciency of the estimator of in the semiarametric model using the conventional ro le-kernel method. In articular, does correct seci cation of the within-cluster correlation still yield a n-consistent semiarametric ef cient estimator of? The results that we have obtained are surrising. To obtain a n-consistent estimator of using the conventional ro lekernel method, one generally must either arti cially undersmooth ˆ4t5 or comletely ignore the within-cluster correlation by assuming working indeendence in the ro le-kernel estimating equations. Thus, if one accounts for within-cluster correlation using the ro le-kernel method, then the stard bwidth selection methods used for estimating ˆ4t5, such as cross-validation, fail, the swich covariance estimator of the estimator of fails, the conventional hyothesis tests on such as the Wald Score tests fail. With undersmoothing or working indeendence, asymtotically correct inference about becomes ossible. To gain insight into these results, we derive the semiarametric ef cient score of, which is found to have a comlicated form, show that unlike for indeendent data, the ro le-kernel method does not yield a score function that is asymtotically equivalent to the semiarametric ef cient score for, even when the true correlation is assumed ˆ4t5 is undersmoothed. Our main conclusion is that, unlike for indeendent data, the conventional ro lekernel method is not semiarametric ef cient must be modi ed in ad hoc ways (undersmoothing) or to be made less ef cient (working indeendence) to even be made n consistent. The article is organized as follows. In Section we state the semiarametric model for clustered data in Section 3 discuss estimation of ˆ4t5 using kernel estimating equations reviously roosed in the literature of using ro le estimating equations. In Section 4 we study the asymtotic roerties of the ro le-kernel estimators of ˆ4t5. In Section 5 we derive the semiarametric ef cient score of within a likelihood framework, show that the conventional ro le-kernel estimating equations of often do not yield a score equation that is asymtotically equivalent to the semiarametric ef cient score of. In Section 6 we discuss ractical imlications of our results. We illustrate the methods with a simulation study in Section 7 an alication to infectious disease data in Section 8. We conclude with a discussion is Section 9.. A SEMIPARAMETRIC MARGINAL MODEL In this section we resent the semiarametric regression model for clustered data. Suose that the data consist of n clusters with the ith (i D 11 : : : 1 n) cluster having m i observations. Let Y 4X 1 T 5 be the resonse variable the covariates of the th ( D 11 : : : 1 m i ) observation in the ith cluster, where X is a 1 vector T is a scalar. Given the covariates X T, the mean the variance of the outcome variable Y are E4Y 5 D Œ var4y 5 D w ƒ1 V 4Œ 5, where is a scale arameter, w is a known weight, V 4 5 is a known variance function. The marginal mean Œ deends on X T through a known monotonic differentiable link function g4 5, g4œ 5 D X T C ˆ4T 51 (1) where is a 1 vector ˆ4 5 is an unknown smooth function. We model the effects of X ( 15 arametrically the effects of T nonarametrically, treat the within-cluster correlation arameters as nuisance arameters. In articular, it is imortant to note the assumtion (Pee Couer 1997) that E4Y X 1 T 5 D E8Y X 1 T 1 4X 1 T ik 5 k6d 91 () an assumtion also made imlicitly by Lin Carroll (000). In matrix notation, denoting by Œ i D 4Œ i1 1 : : : 1 Œ imi 5 T, g4œ i 5 D 8g4Œ i1 51 : : : 1 g4œ imi 59 T, Y i D 4Y i1 1 : : : 1 Y imi 5 T, X i, T i similarly, we have g4œ i 5 D X i C ˆ4T i 5. If model (1) does not include ˆ4T 5, then it reduces to the arametric generalized linear model considered by Liang Zeger (1986). If model (1) does not include X T, then it reduces to the nonarametric model considered by Lin Carroll (000). Severini Staniswalis (1994) considered a model similar to (1) (). It is imortant to emhasize that we are considering a marginal model for the clustered data through seci cation of mean variance functions. This is in the sirit of GEE-tye models (Liang Zeger 1986). Excet for Gaussian data, our marginal models need not be a full semiarametric likelihood seci cation. 3. PROFILE-KERNEL ESTIMATING EQUATIONS In this section we develo kernel estimating equations for ˆ4t5 ro le estimating equations for. The formulation of the ro le estimating equation is similar to the score equation calculated using the conventional ro le likelihood aroach in arametric regression. We give the motivation of these estimating equations in Section 3.1, describe their forms in Section 3..

3 Lin Carroll: Semiarametric Regression for Clustered Data Motivation of the Pro le-kernel Estimating Equations To motivate the ro le-kernel estimating equations for ˆ4t5 under the semiarametric model (1), we rst consider the GEEs for the arametric model g4œ 5 D X T 0 (3) Of course, (3) is a secial case of (1) when ˆ4t5 D 0. Liang Zeger (1986) roosed estimating using the estimating equations Œ4X i 5 T V ƒ1 i 4Y i ƒ Œ i 5 D X T i ã iv ƒ1 i 4Y i ƒ Œ i 5 D 01 (4) where Œ i D E4Y i 5 D Œ4X i 5 with the th comonent Œ D Œ4X T 5 D gƒ1 4X T 51 ã D i diag8 9, 4 5 is the rst derivative of Œ4 5, V i D S 1= i R i 4 5S 1= i, S i D diag6 w ƒ1 V 8Œ 97 contains the marginal variances of the Y, R i is an invertible working correlation matrix, ossibly deending a arameter vector, which can be estimated using the method of moments. Liang Zeger (1986) showed that the GEE estimator O is asymtotically consistent if the mean function Œ is correctly seci ed even when the working correlation matrix R i is misseci ed. The ef cient kernal estimator of is obtained by secifying R i as the true correlation matrix. Now consider kernel estimating equations for the nonarametric model g4œ 5 D ˆ4T 50 (5) Lin Carroll (000) considered the th local olynomial kernel estimating equations for ˆ4t5. We consider here the local linear kernel estimator, that is, D 1. Let h denote the bwidth arameter, let K4 5 denote the symmetric kernel density function. Let K h 4v5 D h ƒ1 K4v=h5 T i 4t5 be an m i matrix with the th row 811 4T ƒ t5=h9. Lin Carroll (000) considered two kernel (symmetric asymmetric) estimating equations for ˆ4t5 at any t, T i 4t5 T ã i 4t5K 1= ih 4t5Vƒ1 i 4t5K 1= ih 4t58Y ƒ i Œ i4t59 D 0 (6) T i 4t5 T ã i 4t5V ƒ1 i 4t5K ih 4t58Y i ƒ Œ i 4t59 D 01 (7) where K ih 4t5 D diag8k h 4T ƒ t59 [Œ i 4t51 ã i 4t5, V i 4t5, S i 4t59 are the same as those de ned in (4) excet that they are evaluated at Œ 4t5 D Œ8 0 C 1 4T ƒt5=h9, D T is a 1 vector of unknown arameters. Equation (7) was also considered by Severini Staniswalis (1994) using the local average kernel ( D 05. Having estimated at t as O, the kernel estimator of ˆ4t5 is Oˆ4t5 D O 0. The working correlation matrix R i in V i 4t5 may again deend on a arameter vector, which again can be estimated using the method of moments. The kernel estimators under (6) (7) are different excet when working indeendence is assumed; that is, R i D I. Lin Carroll (000) showed that the two estimators under (6) (7) have different asymtotic roerties; asymtotic roerties of the kernel estimator under (7) are much harder to study. The most imortant results of Lin Carroll (000) are that, unlike the arametric GEE estimator in (4), tyically the asymtotically most ef cient kernel estimator of the nonarametric function Oˆ4t5 using (6) (7) is obtained by entirely ignoring the within-cluster correlation retending that the observations within the same cluster were indeendent; that is, assuming working indeendence R i D I. Correctly secifying the correlation matrix in fact tyically has adverse effects results in an asymtotically less ef cient estimator of ˆ4t5. In view of the oosite asymtotic behaviors of arametric nonarametric regression, we are led to ask whether using the conventional kernel method to estimate ˆ4t5 will affect n consistency ef ciency of the estimation of. For examle, is it still ossible to secify an aroriate working correlation matrix in estimating equations in the semiarametric model (1) to obtain consistent ef cient estimators of ˆ4t5? The various combinations of working indeendence true correlation structure can be entertained for the searate estimating equations for ˆ4t5. We ursue this question using ro le likelihood ideas. We roose the ro le-kernel estimating equations for the semiarametric model (1) in the next section, answer these questions in Section 4 by erforming asymtotic analysis. 3. Pro le-kernel Estimating Equations for Semiarametric Model (1) In this section we develo estimating equations for ˆ4t5 in the semiarametric model (1). A main feature of (1) is that is a nite-dimensional arameter vector ˆ4t5 is an in nite-dimensional arameter. For indeendent data when the mean variance functions determine a distribution, (e.g., generalized linear models), if the kernel method is used to estimate ˆ4t5, then the ro le method yields a n-consistent semiarametric ef cient estimator of (Carroll et al. 1997; Severini Staniswalis 1994). We hence use kernel estimating equations similar to (6) (7) to estimate ˆ4t5, use ro le estimating equations to estimate by modifying (4). We call the resulting estimating equations ro le-kernel estimating equations. In the light of the discussion at the end of Section 3.1, we allow the working correlation matrices to be different in the two sets of estimating equations. In the same sirit of arametric GEEs, our rimary goal is to investigate whether we can construct a n-consistent semiarametric ef cient estimator of by assuming the true correlation matrix. Our secondary goal is to investigate whether we could also construct a consistent ef cient estimator of ˆ4t5 at the conventional nonarametric rate. If is known, then we estimate ˆ4t5 using one of the following estimating equations: T i 4t5 T ã i 4X i 1 t5k 1= ih 4t5Vƒ1 i 4X i 1 t5k 1= ih 4t5 8Y i ƒ Œ i 4X i 1 t59 D 0 (8)

4 1048 Journal of the American Statistical Association, Setember 001 or T i 4t5 T ã i 4X i 1t5V ƒ1 i 4X i1t5k ih 4t58Y i ƒœ i 4X i 1t59D01 (9) where K ih 4t5, Œ i 4X i 1 t5, ã i 4X i 1 t5, V i 4X i 1 t5 D S 1= i 4X i 1 t5 R i S 1= i 4X i 1 t5 are the same as those in (6) (7) excet that they are evaluated at Œ 4X 1 t3 5 D Œ8X T C C 0 1 4T ƒ t5=h9. Having estimated at t as O 4 5, the kernel estimator of ˆ4t5 is Oˆ4t3 5 D O The working correlation matrix R i in V i 4t5 may again deend on a arameter vector, which can be estimated using the method of moments (Liang Zeger 1986). Estimation of roceeds by solving the ro le estimating equations obtained by modifying the arametric GEEs (4) solving Œ8X i C Oˆ4Ti 3 59 T V ƒ1 1i 4X i 1 T i 5 6Y i ƒ Œ8X i C Oˆ4Ti D 01 (10) where Oˆ4T i 3 5 D 8 Oˆ4T i : : : 1 Oˆ4T imi 3 59 T, V 1i 4X i 1 T i 5 D S 1= i 4X i 1 T i 5R 1i S 1= i 4X i 1 T i 51 S i 4X i 1 T i 5 D diag8 w ƒ1 V 6Œ8X T C Oˆ4T , where R 1i is a working correlation matrix deending on a arameter vector 1 that could be estimated using the method of moments (Liang Zeger 1986). For examle, in anel data R 1i ² R can be estimated by n P ƒ1 n Sƒ1= i r i r T i Sƒ1= i, where r i D Y i ƒ Œ8X O i C Oˆ4Ti 3 59, O where O is comuted from working indeendence. The estimators 8 1 O Oˆ4t59 ointly solving (8) or (9), (10) are termed ro le-kernel estimators. Our asymtotics assume that are known, but in fact it can be shown that the results aly when they are estimated. Note that we allow the working correlation matrices R i in (8) or (9) R 1i in (10) to be different. The estimator of Zeger Diggle (1994) can be viewed as a secial case of our ro le-kernel estimators. They considered longitudinal Gaussian data assumed working indeendence when estimating ˆ4t5; that is, R i D I R 1i equal to the true correlation matrix when estimating. Severini Staniswalis (1994) used (8) (10) assuming the same working correlation matrices; that is, R 1i D R i D R i or, equivalently, V 1i D V i D V i. Note that these authors considered local average kernel estimation instead of local linear kernel estimation as in (9). We study the asymtotic roerties of the general ro le-kernel estimators these secial cases in Section 4. Our results are unexected. Seci cally, the key conclusions from our asymtotic analyses are as follows: 1. If stard smoothing is used, only when R 1i D R i D I, i.e., assuming working indeendence, O is n- consistent.. For other seci cations of the working correlations 8R 1i 1 R i 9, including the case when R 1i is the true correlation matrix any seci cation for R i, excet for secial cases, O is n-inconsistent unless ˆ4t5 is undersmoothed. When ˆ4t5 is undersmoothed the true correlation matrix is assumed, the resulting ro le-kernel estimator O is not semiarametric ef cient. 3. Calculation of the semiarametric ef cient estimator of is comlicated even in the multivariate Gaussian case: construction of the semiarametric ef cient score requires solving a comlicated Fredholm integral equation estimating the multivariate oint distribution of 4X1 T5. 4. ASYMPTOTIC RESULTS In this section we study the asymtotic roerties of the ro le-kernel estimators 8 1 O Oˆ4t59. We focus on the symmetric local linear kernel estimating equations (8) the ro le estimating equations (10). The reason that we focus on (8) instead of (9) in our asymtotic analysis is that the asymtotic roerties of the estimators under (9) are dif cult to study because of the asymmetric nature of (9) (Lin Carroll 000). However, we show that if one uses in (9) the local average kernel, which includes the existing estimators (Severini Staniswalis 1994; Zeger Diggle 1994) as secial cases, then the resulting estimators have qualitatively similar asymtotic roerties to those of 8 1 O Oˆ4t59. In what follows, let m i D m < ˆ, (i.e., assuming nite cluster size) let T be a continuous observation-level covariate (e.g., a time-varying covariate in longitudinal studies). We allow the m comonents of X i T i to be correlated unless stated otherwise assume the density of T i to be continuous. We further assume that the 4Y i 1 X i 1 T i 5 (i D 11 : : : 1 n) are iid trilets that both V 1i 4Œ i 1 5 D V 1 4Œ i 1 5 V i 4Œ i 1 5 D V 4Œ i 1 5 are invertible. Let d 4r5 4 5 denote the rth derivative of any function d4 5, let v k denote the 41 k5th element of a matrix V ƒ1, let f 4t5 denote the marginal density of T. Suose that the kernel density function K4 5 has mean 0 unit variance; that is, R sk4s5du D 0 R s K4s5 D 1. We rst rewrite the ro le estimating equations for in (10) as ex T i ã4x i1 T i 5V ƒ1 1i 4X i1 T i 5 6Y i ƒ Œ8X i C Oˆ4Ti D 01 (11) where e X i D X i C Oˆ4T i 3 5= T ã4x i 1 T i 5 D diag6 8X T C Oˆ4T Calculations in Aendix A show that, asymtotically, Oˆ4t3 5= D ƒw ƒ1 4t5W x 4t5 C o415, where, suressing the index i denoting Œ l D Œ8X T C ˆ4t59 l (l D 11 : : : 1 m), W 4t5 D mx E ld1 W x 4t5 D X m E ld1 n n l l o v ll T l D t f l4t5 o v ll X l T l D t f l4t50 It follows that e Xi D 4e Xi1 1 : : : 1 e Xim 5 T, where e X D X ƒ W ƒ1 4T 5W x4t 5. Using these results, in Result 1 we study the asymtotic distributions of 8 Oˆ4t51 9. O A sketch of its roof is given in Aendix A.

5 Lin Carroll: Semiarametric Regression for Clustered Data 1049 Result 1. Let 8 Oˆ4t51 9 O denote the solution of the ro lekernel estimating equations (8) (10), where Oˆ4t5 D Oˆ4t3 5. O Suose that h / n ƒ, 1=5 µ µ 1=3 n! ˆ. We then have the following: a. If O is n consistent, [i.e., n4 O ƒ 5 D O then there is an asymtotically equivalent rom variable such that var8 Oˆ4t59 ƒ nh P m D1 E bias8 Oˆ4t59 h ˆ45 4t5= (1) P m D1 E Œ415 Œ415 v T D t f 4t5 v T D t f 1 (13) 4t5 where D var4y X 1 T 5 D w ƒ1 V 4Œ 5. It follows that var8 Oˆ4t59 is minimized when assuming working indeendence R D I is var8 Oˆ4t59 ƒ nh ( X m E D1 n o ƒ1 T D t f 4t5) ƒ1 0 (14) b. The estimator O converges in distribution: n8 O ƒ ƒ h b4 1 ˆ5=9! N 401 V 5, where, suressing the subscrit i in each term inside the exectations, b4 1 ˆ5 D 8E4e X T ãv ƒ1 1 ã e X59 ƒ1 E8e X T ãv ƒ1 1 ãˆ45 4T591 V D 8E4e X T ãv ƒ1 1 ã e X59 ƒ1 E84Z 1 ƒ Z 5 T è4z 1 ƒ Z 59 8E4e X T ãv ƒ1 1 ã e X59 ƒ1 1 è D cov4y X1 T5 Z 1 D V ƒ1 1 ã e X, the th row of Z is Z D v mx mx h i E exk k vkl 1 Œ415 l T l D T 5 kd1 ld1 W ƒ1 4T 5f 4T 50 c. If these two conditions working indeendence is assumed in both (8) (10), (i.e., R 1i D R i D I) 4X 1 T 5 have the same marginal density, [i.e., f 4X 1 T 5 D f 4X 1 T 5] are satis ed, then O is n consistent; that is, the bias term b4 1 ˆ5 D 0 n8 ƒ O 9! N 401 e V 5 in distribution, where, suressing the subscrit i in each term inside the exectations, ev D 8E4e X T ãè ƒ1 d ã e X59 ƒ1 E8e X T ãè ƒ1 d èèƒ1ã e d X9 8E4e X T ãè ƒ1 d ã e X59 ƒ1 1 è d is a diagonal matrix with the diagonal elements of è, (i.e., ) on the diagonal. d. For other seci cations of the working correlation matrices R 1i R i, including the true correlation matrix, O is often n inconsistent; that is, n4 O ƒ 5! ˆ in distribution. However, if one assumes that nh 4! 0 [i.e., undersmooths ˆ4t5], then for any seci cation of the working correlation matrices R 1i R i, O is n consistent n4 O ƒ 5! N 401 V 5 in distribution. In general, V can be estimated by relacing terms in its exression by estimates of those terms. We conecture that the bootstra can also be used. The results in art a of Result 1 are similar to those of Lin Carroll (000) when the covariate X is absent in model (1), excet that the variance of Oˆ4t5 now involves conditional exectations of X given T. These results suggest that if the ro le estimator of is n consistent, then Oˆ4t5 is consistent asymtotically normal at the regular nonarametric rate. The most ef cient estimator of Oˆ4t5 is obtained by comletely ignoring the within-cluster correlation. To see why the bias term b4 1 ˆ5 6D 0 for non-identity working correlation matrices, consider linear models for multivariate normal Y i. Suose that the marginal density of 8X 1 T 9 4 D 11 : : : 1 m5 is the same. Then the th comonent of e X is e X D X ƒ E4X T 5. It follows that the second term of b4 1 ˆ5 is E8e X T V ƒ1 ˆ45 1 4T59 D P m P m D1 kd1 E8C k4t k 5v k 1 ˆ4T k59, where C k 4T k 5 D E4X T k 5 ƒ E8E4X T 5 T k 59 is generally not equal to 0 excet when D k. This means that the bias term b4 1 ˆ5 6D 0 unless we assume working indeendence, (i.e., R 1 D I), or E4X T 1 T k 5 D E4X T 5 for any 1 k (e.g., when X T are indeendent). Simle calculations show that for multivariate normal Y, if X T are indeendent, then O in fact is n consistent for any arbitrary working correlation matrices R 1 R. Furthermore, as shown in Section 5, if one assumes R 1i equal to the true correlation matrix in (10) working indeendence R i D I in (8), then O is n consistent semiarametric ef cient, Oˆ4t5 is ef cient as well. The foregoing indeendence assumtion of X T is strong dif cult to satisfy in ractice if both covariates X T are time-varying covariates. But if X contains only one-time covariates T is time in longitudinal studies, then this condition is satis ed. Note that the outcome needs to be normally distributed for the foregoing results to hold. For non-gaussian data, if the true correlation matrix is used, even when X T are indeendent, then O is still n inconsistent. Result 1 assumes that ˆ4t5 is estimated using the symmetric local linear kernel estimating equation (8). Severini Staniswalis (1994) Zeger Diggle (1994) roosed slightly different estimators. They estimated ˆ4t5 by relacing the symmetric local linear kernel estimating equation (8) with the asymmetric local average kernel estimating equation, which is obtained by letting Œ4X 1 t5 D Œ4X T C 5 0 relacing T i 4t5 by 1 i in (9). We denote these estimators by 8 O ü 1 Oˆü 4t59. Seci cally, Severini Staniswalis (1994) assumed the same working correlation matrix in both ˆ4t5 estimating equations, that is, R 1i D R i D R i. Zeger Diggle (1994) considered Gaussian data assumed R 1i equal to the true correlation R i D I i (working indeendence). It can be shown that the asymtotic roerties of

6 1050 Journal of the American Statistical Association, Setember O ü 1 Oˆü 4t59 are similar to those of 8 O 1 Oˆ4t59 in Result 1, that the conclusions are the same. Comutation. A Fisher Sivring algorithm for comutation for the working indendence estimation is given in Aenix C. 5. SEMIPARAMETRIC EFFICIENT SCORE It is of substantial interest to underst why the ro lekernel estimator O is n inconsistent when the true correlation matrix is used unless ˆ4t5 is undersmoothed. One way to address this question is to de ne a likelihood function for Y i comare how the ro le-kernel estimating equation (10) differs from the semiarametric ef cient score for (Bickel et al., 1993). The motivation of this investigation is as follows. For indeendent data, (i.e., the cluster size m D 1), suose that the distribution of the outcome Y belongs to the linear exonential family. If ˆ4t5 is smoothed using stard kernel methods (e.g., cross-validation), then the ro le-kernel estimating equation of is asymtotically equivalent to the semiarametric ef cient score of (Carroll et al. 1997; Severini Staniswalis 1994). The resulting ro le estimator O hence is n consistent semiarametric ef cient. If one uses an estimating equation for asymtotically different from the semiarametric ef cient score [e.g., by simly relacing ex i in (11) (simli ed for m D 1) by X i ], then the resulting estimator O is n inconsistent unless ˆ4t5 is undersmoothed (Rice 1986). Our key ndings in this section are as follows. First, the semiarametric ef cient score of for multivariate Gaussian data is comlicated requires solving the Fredholm integral equation of the second kind estimating the oint distribution of X i T i. Second, if regular smoothing is used for estimating ˆ4t5, then the ro le-kernel score of estimates the semiarametric ef cient score with a nonzero bias. This exlains why the ro le-kernel estimator O is often n inconsistent. Finally, when Oˆ4t5 is undersmoothed, the ro le-kernel estimator of is n consistent but is still not semiarametric ef cient, excet for secial cases. We rst derive the semiarametric ef cient score of. We assume a constant cluster size 1 < m < ˆ suress the index i. To underst the fundamental issues involved, we consider Y to be multivariate normal N 8X Cˆ4T51 V9, where ˆ4T5 D 8ˆ4T 1 51 : : : 1 ˆ4T m 59 T V is assumed known. In Aendix B we show that the semiarametric ef cient score of is 8X ƒ ü 4T59 T V ƒ1 8Y ƒ X ƒ ˆ4T591 (15) where ü 4T5 D 8 ü 4T 1 51 : : : 1 ü 4T m 59 T, ü 4T 5 D 8 ü 1 4T 51 : : : 1 ü 4T 59 T, is the dimension of. The semiarametric ef ciency bound of is E86X ƒ ü 4T57 T V ƒ1 6X ƒ ü 4T579. The function ü 4t5 solves mx mx v k E86X ƒ ü 4T 57 T k D t9f k 4t5 D 01 (16) D1 kd1 where X D 4X 1 1 : : : 1 X 1 : : : 1 X m 5 T, v k is the 41 k5th element of V ƒ1, f k 4t5 is the density of T k. Simle calculations show that (16) can be written as the Fredholm integral equation of the second kind (Bronshtein Semendyayev 1985, sec. 8.4) Z ü 4t5 C H4t1 s5 ü 4s5ds D q4t51 (17) where H4t1 s5 q4t5 are de ned as PP 6Dk v k f 4T H4t1 s5 D D s1 T k D t5 P m D1 v f 4T D t5 q4t5 D P m P m D1 kd1 vk E4X T k D t5f4t k D t5 P m 1 D1 v f 4T D t5 where f 4 5 denotes a density function. If H 4t1 s5 is square-integrable, then (17) has only one solution, excet when the eigenvalues of (17) contain ƒ1 its solution can be written as ü 4t5 D ƒ R â4t1 s5q4s5ds C q4t5, where â4t1 s5 is called the resolvent kernel can be written as the Fredholm series, â4t1 s5 D Pˆ kd0 H k4t1 s5= Pˆ kd0 k, with 0 D 01 H 0 4t1 s5 D H 4t1 s5, k D k R ƒ1 H kƒ1 4t1 t5dt, H k 4t1 s5 D H kƒ1 4t1 s5 k ƒ R H4t1 u5h kƒ1 4u1 s5du (Bronshtein Semendyayev, 1985, sec ). An alternative exression of â4t1 s5 is given by the Neumann series (Bronshtein Semendyayev 1985, sec ). The foregoing Fredholm series always converges but is of little use when numerically calculating ü 4t5, because in most cases the aroximation is inadequate for small values of k. More useful is the Nyström method (Bronshtein Semendyayev 1985, sec ). The foregoing discussion suggests that construction of the semiarametric ef cient score of is comlicated even in the multivariate normal case. One needs to solve the comlicated integral equation (17), which requires estimating the airwise oint densities of 4T 1 T k 5 the airwise conditional exectations E4X T k 5 when calculating H 4t1 s5 q4t5. However, in the secial case when the marginal density of 4X 1 T 5 is the same E4X T 1 T k 5 D E4X T k 5 (e.g., when X T are indeendent), simle calculations show that the solution of (16) has the closed form ü 4t5 D E4X T D t5. We now study for multivariate Gaussian data how the semiarametric ef cient score (15) asymtotically differs from the ro le-kernel estimating equation of in (10) when the working correlation matrix R is the true correlation matrix. Using the results in Aendix A, we can easily show that the ro le estimating equation for in (11) is asymtotically equivalent to 4e X T V ƒ1 ƒ Z T 58Y ƒ X ƒ ˆ4T59 Ce X T V ƒ1ˆ45 4T 5h =1 (18) where the th comonent of e X is e X D X ƒe4x T 5 Z is de ned in Result 1. A comarison between (15) (18) suggests that they are often different that (18) is often subect to a nonzero bias. Even when ˆ4t5 is undersmoothed [i.e., the second bias term in (18) is 0], some calculations show that the rst term in (18) is still generally different from (15). In other words, the ro le-kernel score (10) is often asymtotically different from the semiarametric ef cient score (15). But when X T are indeendent, they are the same asymtotically,

7 Lin Carroll: Semiarametric Regression for Clustered Data 1051 the ro le-kernel estimator of hence is n consistent semiarametric ef cient. Some calculations show that the same conclusion holds for the ro le-kernel estimator O ü when Oˆü 4t5 is the average kernel estimator obtained using the asymmetric kernel estimating equation (9); see Section 4. It is dif cult to construct the semiarametric ef cient score directly using the comlicated form of ü 4t5 in (15), because this involves theoretical density functions exectations. This raises an oen question on how to construct a ractical semiarametric ef cient estimator of. It is a reasonable conecture that if such a construction is ushed through, then undersmoothing will not be required. 6. PRACTICAL IMPLICATIONS OF THE THEORETICAL RESULTS AND COMPUTATION OF THE ESTIMATES Cross-Validation. Conventional bwidth selection techniques, such as cross-validation by deleting one cluster data at a time, fail unless working indeendence is assumed. Because the bwidth h chosen by cross-validation satis- es h D O4n ƒ1=5 5, O will be n inconsistent unless working indeendence is assumed (Result 1). Unfortunately, there is no generally acceted data-driven way to choose h to undersmooth ˆ4t5, although ad hoc methods have been roosed (Brockmann, Gasser, Herrmann 1993). In our exerience, we have found that multilying the bwidth by n ƒ=15, which makes h / n ƒ1=3, often works quite well in ractice. Presumably, other methods (e.g., higher order kernels, twicing) can be used to eliminate the bias. Swich Method. The swich method, which is commonly used in calculating the covariance estimator of O in estimating equations (Liang Zeger 1986), will give an inconsistent estimator of cov4 5 O unless working indeendence is assumed. This is because it ignores the extra Z term in V in art b of Result 1. This is true even when one undersmoothes ˆ4t5. We conecture that the bootstra can be used. Hyothesis Testing. One is often interested in testing H 0 D 0 or art of is 0. If conventional smoothing techniques such as cross-validation are used, then the Wald test the score test for H 0 will be inconsistent unless working indeendence is assumed or ˆ4t5 is undersmoothed. For examle, when the Wald test is used, O in fact estimates the true lus the bias term b4 1 ˆ5h =. Functional Data Analysis. The simlest functional regression model (Ramsay Dalzell 1991) is Y i 4t5 D ˆ4t5 C i 4t5, where i indexes the ith subect, t indexes time t, i 4t5 is an error whose distribution is a Gaussian rocess with mean 0 cov8e4t51 e4s59 D 4t1 s5. Rice Silverman (1991) considered estimating ˆ4t5 using a smoothing sline. The results of Lin Carroll (000) suggest that the most ef cient estimator of ˆ4t5 when the kernel method is used is obtained by entirely ignoring the correlation of the reeated measures of Y i 4t5 over time. In the resence of covariates X i 4t5 D 8X i1 4t51 : : : 1 X i 4t59 T, a semiarametric functional regression model could be considered, Y i 4t5 D X i 4t5 T C ˆ4t5 C i 4t50 (19) The semiarametric model (1) is a discrete version of (19). Suose that the ro le-kernel method is used to estimate 8 1 ˆ4t59. Our results suggest that (a) if X i 4t5 is a vector of one-time subect-level covariates (i.e., X i 4t5 D X i free of t), by secifying R 1 as the true correlation matrix R D I, O is n consistent semiarametric ef cient Oˆ4t5 is asymtotically ef cient as well, (b) if X i 4t5 contains timevarying covariates (i.e., X T are not indeendent), then one must assume working indeendence (R 1 D R D I) or undersmooth ˆ4t5 to obtain a n consistent (but inef cient) estimator of. O It is imortant to emhasize that our results assume that the number of observations er subect m is nite, as is common in longitudinal studies. With T being time, our asymtotic analysis thus assumes that observations from different subects may be observed at different time oints asymtotically, but the number of observations er subect remains bounded. Comutation. A Fisher-Sivring algorithm for comutation for the working indendence estimator is given in Aendix C. 7. SIMULATION STUDY We conducted a simulation study to evaluate the nitesamle erformance of the ro le-kernel method. Each dataset comrised n D 100 subects m i D 3 observations er subect over time. The covariate vector X was set at X D 4X 1 1 X i 5 T, where X 1 a time-varying covariate X i is a subect level covariate that takes value 1 for half of the subects 0 for the other half mimics a binary treatment indicator. We generated X 1 T according to the model X 1 D b i C e T D b i C e, 0 where b i uniform4ƒ11 15 e e 0 are indeendent follow uniform4ƒ This setu allows the X 1 the T to be correlated with each other over time between their reeated measures with exchangeable correlation.5. Conditional on X T, we generated the outcome Y from multivariate normal with mean Œ D 1 X 1 C X i C ˆ4T 5, where 1 D D 100 ˆ4t5 D sin4t5, Y has variance 1 exchangeable correlation.5. We generated 00 datasets with N D 300 observations each analyzed them using the ro le-kernel methods. For each simulated dataset, we rst assumed working indeendence when we calculated the ro le-kernel estimate of ˆ4t5 estimated the bwidth arameter h needed for the kernel estimate of ˆ4t5 using cross-validation by deleting one subect data at a time. We next calculated the ro le-kernel estimate of ˆ4t5 by accounting for the within-subect correlation. Seci cally, we estimated the true covariance of Y i using the method of moments calculated the bwidth arameter h by multilying the cross-validation bwidth estimate by n ƒ=15. This undersmooths ˆ4t5 eliminates the bias term (Sec. 6), at least theoretically. Table 1 gives the averaged estimated regression coef cients of 1, along with their emirical estimated stard errors (SEs) when working indeendence is assumed when the true covariance of Y i is estimated. When assuming working indeendence, we estimated the SEs of O using the swich estimate given in Aendix C. When assuming that the true covariance is estimated, we estimated the SEs of O using a nite-samle estimate of V given in art b of

8 105 Journal of the American Statistical Association, Setember 001 Table 1. Means Stard Errors of Regression Coef cient Estimates Over 00 Relications Working indeendence True covariance Parameter Mean Emirical SE Estimated SE Mean Emirical SE Estimated SE NOTE: True values are 1 D D 100. Result 1. Table 1 reorts the averages of the estimated stard errors over 00 relications. The results in the table show that the ro le-kernel method erforms well in nite samles that the biases in the ro le-kernel estimates of are minimal under both covariance assumtions. The estimate of 1, the coef cient of the time-varying covariate X 1, is more ef - cient when the true covariance is estimated than when working indeendence is assumed. However, no gain in ef ciency is realized in O by estimating the true covariance of Y i. This is because X is a subect-level covariate is indeendent of T the design is balanced with resect to X. The simulation results are consistent with the theory. The estimated SEs of O also agree well with the simulated SEs. Figure 1 comares the true nonarametric function ˆ4t5 to the kernel estimates of ˆ4t5 when assuming working indeendence when the true covariance is estimated. Both kernel estimates of ˆ4t5 are close to the true ˆ4t5. Figure comares the SEs of these two kernel estimates. It suggests that assuming working indeendence gives a more ef cient kernel estimate of ˆ4t5 than that achieved when assuming the true covariance. These results agree well with the theory. 8. APPLICATION TO THE INFECTIOUS DISEASE DATA In this section we aly the semiarametric model (1) to analyzing the longitudinal infectious disease data introduced in Section 1. A total of 1,00 binary indicators for the resence of resiratory infection (0 D no, 1 D yes) were collected on 75 reschool-age children examined every quarter for u to six consecutive quarters. The rimary interest was to study the association between resiratory infection the exosure variable vitamin A de ciency, which was manifested by xerohthalmia status (0 D no31 D yes51 while adusting for several key confounders. These confounders include age in years, sex (0 D male1 1 D female), height for age, stunting status (0 D no, 1 D yes). (For a detailed descrition of the covariates, see Zeger Karim 1991.) Examination of the distribution of the vertical strokes in Figure 3 suggests that the age effect dearts dramatically from linearity. To avoid ossible confounding of misseci cation of the age effects on estimation of the effect of the key exosure xerohthalmia, we consider a semiarametric logistic model for the th observation of the ith subect as logit8pr4y D 159 D X T C ˆ4age 51 (0) where X comrises xerohthalmia status, seasonal cosine sine, sex, height for age, stunting, ˆ4age 5 is a smooth function of age. Examination of the data suggested that the height for age effect was linear, hence we included it in X. We used the ro le-kernel method assuming working indeendence using the algorithm in Aendix C calculated the SEs using the swich method. We chose the bwidth arameter h using the emirical bias bwidth selection (EBBS) method (Ruert 1997). Figure 3 shows the estimated nonarametric function of age its 95% con dence interval. The risk of resiratory infection increased slightly during the rst years of life decreased thereafter. Table gives the estimated regression coef cients. The data rovide no evidence for vitamin A de ciency on resiratory infection, but strong evidence for the association between resiratory infection sex season. To examine whether a simle arametric model can t the data equally well as the semiarametric model, we t a arametric GEE model with ˆ4age5 to be quadratic assuming Figure 1. True Estimated Nonarametric Functions Oˆ( t) Based on 00 Relications: ( True; assuming working indeendence; assuming that the true covariance is estimated). Figure. Emirical Pointwise SEs of the Estimated Nonarametric Functions Oˆ(t) Based on 00 Relications: ( assuming working indeendence; assuming that the true covariance is estimated).

9 Lin Carroll: Semiarametric Regression for Clustered Data 1053 Figure 3. Estimated Kernel Estimate Oˆ(age) When Fitting the Semiarametric Model (0) to the Infectious Disease Data Assuming Working Indeendence Its 95% Pointwise Con dence Intervals ( Oˆ(age); % con dence interval). The vertical strokes at 0-6 indicate the occurrence of 1 0 in the resonse. working indeendence. Figure 4 comares the semiarametric kernel estimate of ˆ4t5 to its quadratic counterart (Diggle et al. 1994,. 161). The semiarametric kernel estimate suggests that some excess nonlinearity may be undetected by the quadratic age model, a conecture con rmed by the fact that a cubic age model t using GEE had a statistically signi cant cubic age term ( value 00). Table comares the regression coef cients estimated using the semiarametric model the arametric quadratic age model. The coef cient estimates of stunting were considerably different using the two methods, although the other coef cient estimates are similar. This difference was due mainly to misseci cation of the quadratic age effect. 9. DISCUSSION We have considered a marginal semiarametric artially linear generalized linear model for clustered data, where the effects of some covariates X are modeled arametrically as X the effect of some other covariate T is modeled nonarametrically as ˆ4t). Our results aly to the case where the number of observations er cluster is nite the number of clusters is large. The ro le-kernel estimating equations in the literature are used for estimation. The results are unexected. We show that for clustered data, this conventional ro lekernel method fails to yield a n consistent estimator of unless working indeendence is assumed or ˆ4t5 is arti cially undersmoothed. Under working indeendence, one may need to greatly sacri ce ef ciency to achieve n consistency of. Table. Regression Coef cient Estimates in Analysis of the Infectious Disease Data Using the Semiarametric Model the Quadratic Age Model Semiarametric model Quadratic age model Estimate SE Estimate SE Vitamin A Seasonal cosine ƒ ƒ Seasonal sine ƒ ƒ Sex ƒ ƒ Height ƒ ƒ Stunting Figure 4. Comarison of the Kernel Estimate Oˆ(age) ( ) the Quadratic Estimate of Age ( ). When ˆ4t5 is arti cially undersmoothed, the ro le-kernel estimator of is still not semiarametric ef cient, excet for secial cases. To exlain why the ro le-kernel method fails in clustered data, we have derived the semiarametric ef cient score of for multivariate normal semiarametric models. We show that unlike in the indeendent data case, the ro le-kernel method fails to rovide an estimated score equation that is asymtotically equivalent to the semiarametric ef cient score of. Even in this simle multivariate normal case, the semiarametric ef cient score of is comlicated requires solving the Fredholm integral equation estimating the airwise oint distributions of all observations 4X 1 X k 1 T 1 T k ) in the same cluster. Direct estimation of such densities is comlicated could well be infeasible or cumbersome, esecially when cluster sizes vary from one cluster to another. For examle, in longitudinal data, different subects could have different numbers of observations, these different observations might be observed at different time oints. Estimation of the oint distribution of X T is hence dif cult. One strategy is to assume a arametric model for X T to estimate the oint distribution of X T. But this could lead to an inconsistent estimator of if such a arametric model for X T is misseci ed. This leaves an oen question on how to construct a semiarametric ef cient estimator of in ractice for clustered data. Further research is needed. We should note that the results in this article assume that T varies within each cluster. If T is a cluster-level covariate (i.e., T D T i ), then, in contrast to the results reorted in this aer, Lin Carroll (001) showed that the ro lekernel method works as usual yields a n consistent semiarametric ef cient estimate of if the true covariance is assumed regular smoothing is used. APPENDIX A: PROOF OF RESULT 1 A Note on Technical Conditions It is ossible to write down detailed technical conditions that would allow rigorous roofs of the results that follow for anel data. We have chosen not to do so, both in the interest of sace also because similar details have been written down by other authors in similar situations, without any real imact on statistical ractice. These authors include Carroll et al. (1997), Carroll, Knickerbocker, Wang (1995), Carroll W (1991), Severini Staniswalis (1994), Severini Wong (199).

10 1054 Journal of the American Statistical Association, Setember 001 However, there is one situation for which it is easy to write down technical conditions leading to recise roofs namely, the Gaussian linear case with constant true working covariance matrices indeendent of. Haily, this is the roblem of most interest, because all of our global conclusions have been made using this roblem as an illustration. To do this, one must rst assume that, as in Carroll et al. (1995) Severini Staniswalis (1994), the 4T 5 i have common comact suort over their marginal oint densities are bounded away from 0 on this suort. We assume that h / n ƒ, where 1=5 µ µ 1=3. Then, using the techniques of Mack Silverman (198) or Marron Härdle (1986), one can show that (A.) holds uniformly in t. In some cases, (as in Carroll et al. 1995), it is easier to rove this by restricting attention to 4T 5 that fall within a roer comact subset of the common suort, in which case statements of results must be modi ed aroriately. In either case, the Gaussian linear roblem means that nonarametric regressions are stard ones do not involve solving nonlinear equations. We now note the other key features of the Gaussian case. For the Gaussian case, (A.3) (A.4) are exact, with e X i de ned ust after (A.) being indeendent of. In articular, the term o 415 in (A.3) equals 0. With the uniformity of (A.), the calculations following (A.3) (A.4) are then routine. Sketch of the Proof To rove art a, we rst assume that is known show that the asymtotic bias variance of Oˆ4t3 5 are given in (1) (13). The roof is similar to aendix A.4 of Lin Carroll (000) is hence omitted. Following that work, simle alication of the Cauchy Schwartz inequality shows that var8 Oˆ4t3 59 is minimized when R D I is given in (14). We next study the distribution of Oˆ4t3 O 5 when O is n consistent; that is, n4 O ƒ 5 D O 415. We write nh8 Oˆ4t3 O 5ƒ ˆ4t59 D nh8 Oˆ4t3 O 5ƒ Oˆ4t3 59 C nh8 Oˆ4t3 5ƒ ˆ4t59 D h Oˆ4t3 5 T n4 O ƒ 5 C nh8 Oˆ4t3 5ƒ ˆ4t59 C o 4151 (A.1) where Oˆ4t3 5= T D ƒw ƒ1 4t5W x 4t5C o 415 D O 415, where W 4t5 W x 4t5 are de ned in Section 4. Because n4 O ƒ 5 D O 415, the rst term in (A.1) is o 415. Hence the asymtotic distribution of Oˆ4t3 O 5 is the same as that of Oˆ4t3 5. We now study the asymtotic distribution of O. First, using art a of Result 1 following Lin Carroll (000), we have C n ex T i ã i V ƒ1 1i 6Y i ƒ Œ8X i C Oˆ4Ti (A.4) Denote D D lim n!ˆ D n D E4e X T ãv ƒ1 1 ãe X5. Simle calculations show that C n can be exed as C n D C 1n ƒc n C o 4151 where, denoting Œ i D Œ8X i C ˆ4T i 59 Z T 1i D e X T i ã i V ƒ1 1i, C 1n C n ex T i ã i V ƒ1 1i 4Y i ƒ Œ i 5 Z T 1i 4Y i ƒ Œ i5 ex T i ã i V ƒ1 1i ã i 8 Oˆ4T i 3 5ƒ ˆ4T i 590 Obtaining asymtotic distribution of nc 1n is simle. Now examine the distribution of nc n. Using the Taylor exansion (A.), we have C n mx mx D1 kd1 D1 kd1 ex v k 1i Œ415 ik mx mx ex v k 1i Œ415 ik Oˆ4T ik 3 5 ƒ ˆ4T ik 5 W ƒ1 4T ik 5 1 n mx i 0 D1 0 D1 i 0 0 v0 0 i 0 K h 4T i 0 0 ƒ T ik54y i 0 0 ƒ Œ h i 0 05 C ˆ45 4T ik 5 C o 415 mx i 0 D1 0 D1 C h 1 n i 0 0v 0 0 i 0 1 n mx mx D1 kd1 mx mx D1 kd1 ex v k 1i Œ415 ik W ƒ1 4T ik 5 K h 4T ik ƒ T i Y i 0 0 ƒ Œ i 0 05 ex v k 1i Œ415ˆ45 ik 4T ik 5 C o 415 D 1 mx mx mx Z n 4Y ƒ Œ 5 C h D1 D1 kd1 n o E ex v k 1 Œ415ˆ45 k 4T k 5 C o 415 Z T i 4Y i ƒ Œ i5 C h E e X T ãv ƒ1 1 ãˆ45 4T5 C o 4151 where Z i D 8Z i1 1 : : : 1 Z im 9 T Z D v i mx mx E e Xk k vkl 1 Œ415 l T l D T W ƒ1 4T 5f 4T 50 kd1 ld1 Oˆ4t3 5 ƒ ˆ4t5 D W ƒ1 4t5 1 n mx D1 v i K h4t ƒ t54y ƒ Œ 5 C ˆ45 4t5h C o n ƒ1= 0 (A.) It follows that n4 ƒ O 5 D D ƒ1 1 n 4Z 1i ƒ Z i 54Y i ƒ Œ i 5 C nh 4 b4 1 ˆ5= C o 4151 (A.5) De ne e X i as e X D X C Oˆ4T 3 5= T D X ƒw ƒ1 4T 5W x 4T 5. A linear Taylor exansion of (10) gives where the bias term b4 1 ˆ5 D D ƒ1 E8e X T ãv ƒ1 1 ãˆ45 4T59. Equivalently, n8 O ƒ ƒ h b4 1 ˆ5=9! N 401 V 51 where n8 O ƒ 9 D D ƒ1 n 8 nc n 9 C o 4151 (A.3) D n ex T i ã i V ƒ1 1i ã i e Xi where V D D ƒ1 E84Z 1 ƒ Z 5 T è4z 1 ƒ Z 59D ƒ1 with è D cov4y X1 T5. One can see easily that the bias term b4 1 ˆ5 in (A.5) is generally nonzero. Under conventional asymtotics, n! ˆ1 h! 0, nh! ˆ, to obtain a n consistent estimate of, one must identify working correlation matrices R 1 R to make the bias term

Chapter 3. GMM: Selected Topics

Chapter 3. GMM: Selected Topics Chater 3. GMM: Selected oics Contents Otimal Instruments. he issue of interest..............................2 Otimal Instruments under the i:i:d: assumtion..............2. he basic result............................2.2

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Estimating Time-Series Models

Estimating Time-Series Models Estimating ime-series Models he Box-Jenkins methodology for tting a model to a scalar time series fx t g consists of ve stes:. Decide on the order of di erencing d that is needed to roduce a stationary

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI ** Iranian Journal of Science & Technology, Transaction A, Vol 3, No A3 Printed in The Islamic Reublic of Iran, 26 Shiraz University Research Note REGRESSION ANALYSIS IN MARKOV HAIN * A Y ALAMUTI AND M R

More information

The following document is intended for online publication only (authors webpage).

The following document is intended for online publication only (authors webpage). The following document is intended for online ublication only (authors webage). Sulement to Identi cation and stimation of Distributional Imacts of Interventions Using Changes in Inequality Measures, Part

More information

Adaptive Estimation of the Regression Discontinuity Model

Adaptive Estimation of the Regression Discontinuity Model Adative Estimation of the Regression Discontinuity Model Yixiao Sun Deartment of Economics Univeristy of California, San Diego La Jolla, CA 9293-58 Feburary 25 Email: yisun@ucsd.edu; Tel: 858-534-4692

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test) Chater 225 Tests for Two Proortions in a Stratified Design (Cochran/Mantel-Haenszel Test) Introduction In a stratified design, the subects are selected from two or more strata which are formed from imortant

More information

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th DISCUSSION OF THE PAPER BY LIN AND YING Xihong Lin and Raymond J. Carroll Λ July 21, 2000 Λ Xihong Lin (xlin@sph.umich.edu) is Associate Professor, Department ofbiostatistics, University of Michigan, Ann

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

Linear diophantine equations for discrete tomography

Linear diophantine equations for discrete tomography Journal of X-Ray Science and Technology 10 001 59 66 59 IOS Press Linear diohantine euations for discrete tomograhy Yangbo Ye a,gewang b and Jiehua Zhu a a Deartment of Mathematics, The University of Iowa,

More information

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014 Morten Frydenberg Section for Biostatistics Version :Friday, 05 Setember 204 All models are aroximations! The best model does not exist! Comlicated models needs a lot of data. lower your ambitions or get

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and

More information

Notes on Instrumental Variables Methods

Notes on Instrumental Variables Methods Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of

More information

The power performance of fixed-t panel unit root tests allowing for structural breaks in their deterministic components

The power performance of fixed-t panel unit root tests allowing for structural breaks in their deterministic components ATHES UIVERSITY OF ECOOMICS AD BUSIESS DEPARTMET OF ECOOMICS WORKIG PAPER SERIES 23-203 The ower erformance of fixed-t anel unit root tests allowing for structural breaks in their deterministic comonents

More information

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Journal of Modern Alied Statistical Methods Volume Issue Article 7 --03 A Comarison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Ghadban Khalaf King Khalid University, Saudi

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

Johan Lyhagen Department of Information Science, Uppsala University. Abstract

Johan Lyhagen Department of Information Science, Uppsala University. Abstract Why not use standard anel unit root test for testing PPP Johan Lyhagen Deartment of Information Science, Usala University Abstract In this aer we show the consequences of alying a anel unit root test that

More information

Asymptotic F Test in a GMM Framework with Cross Sectional Dependence

Asymptotic F Test in a GMM Framework with Cross Sectional Dependence Asymtotic F Test in a GMM Framework with Cross Sectional Deendence Yixiao Sun Deartment of Economics University of California, San Diego Min Seong Kim y Deartment of Economics Ryerson University First

More information

Exercises Econometric Models

Exercises Econometric Models Exercises Econometric Models. Let u t be a scalar random variable such that E(u t j I t ) =, t = ; ; ::::, where I t is the (stochastic) information set available at time t. Show that under the hyothesis

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

Heteroskedasticity, Autocorrelation, and Spatial Correlation Robust Inference in Linear Panel Models with Fixed-E ects

Heteroskedasticity, Autocorrelation, and Spatial Correlation Robust Inference in Linear Panel Models with Fixed-E ects Heteroskedasticity, Autocorrelation, and Satial Correlation Robust Inference in Linear Panel Models with Fixed-E ects Timothy J. Vogelsang Deartments of Economics, Michigan State University December 28,

More information

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS #A13 INTEGERS 14 (014) ON THE LEAST SIGNIFICANT ADIC DIGITS OF CERTAIN LUCAS NUMBERS Tamás Lengyel Deartment of Mathematics, Occidental College, Los Angeles, California lengyel@oxy.edu Received: 6/13/13,

More information

Estimating function analysis for a class of Tweedie regression models

Estimating function analysis for a class of Tweedie regression models Title Estimating function analysis for a class of Tweedie regression models Author Wagner Hugo Bonat Deartamento de Estatística - DEST, Laboratório de Estatística e Geoinformação - LEG, Universidade Federal

More information

CHAPTER 3: TANGENT SPACE

CHAPTER 3: TANGENT SPACE CHAPTER 3: TANGENT SPACE DAVID GLICKENSTEIN 1. Tangent sace We shall de ne the tangent sace in several ways. We rst try gluing them together. We know vectors in a Euclidean sace require a baseoint x 2

More information

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data Quality Technology & Quantitative Management Vol. 1, No.,. 51-65, 15 QTQM IAQM 15 Lower onfidence Bound for Process-Yield Index with Autocorrelated Process Data Fu-Kwun Wang * and Yeneneh Tamirat Deartment

More information

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.

More information

On the asymptotic sizes of subset Anderson-Rubin and Lagrange multiplier tests in linear instrumental variables regression

On the asymptotic sizes of subset Anderson-Rubin and Lagrange multiplier tests in linear instrumental variables regression On the asymtotic sizes of subset Anderson-Rubin and Lagrange multilier tests in linear instrumental variables regression Patrik Guggenberger Frank Kleibergeny Sohocles Mavroeidisz Linchun Chen\ June 22

More information

MATH 2710: NOTES FOR ANALYSIS

MATH 2710: NOTES FOR ANALYSIS MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite

More information

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)] LECTURE 7 NOTES 1. Convergence of random variables. Before delving into the large samle roerties of the MLE, we review some concets from large samle theory. 1. Convergence in robability: x n x if, for

More information

The non-stochastic multi-armed bandit problem

The non-stochastic multi-armed bandit problem Submitted for journal ublication. The non-stochastic multi-armed bandit roblem Peter Auer Institute for Theoretical Comuter Science Graz University of Technology A-8010 Graz (Austria) auer@igi.tu-graz.ac.at

More information

Probability Estimates for Multi-class Classification by Pairwise Coupling

Probability Estimates for Multi-class Classification by Pairwise Coupling Probability Estimates for Multi-class Classification by Pairwise Couling Ting-Fan Wu Chih-Jen Lin Deartment of Comuter Science National Taiwan University Taiei 06, Taiwan Ruby C. Weng Deartment of Statistics

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

On split sample and randomized confidence intervals for binomial proportions

On split sample and randomized confidence intervals for binomial proportions On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have

More information

A New Asymmetric Interaction Ridge (AIR) Regression Method

A New Asymmetric Interaction Ridge (AIR) Regression Method A New Asymmetric Interaction Ridge (AIR) Regression Method by Kristofer Månsson, Ghazi Shukur, and Pär Sölander The Swedish Retail Institute, HUI Research, Stockholm, Sweden. Deartment of Economics and

More information

E cient Semiparametric Estimation of Quantile Treatment E ects

E cient Semiparametric Estimation of Quantile Treatment E ects E cient Semiarametric Estimation of Quantile Treatment E ects Sergio Firo y First Draft: ovember 2002 This Draft: January 2004 Abstract This aer resents calculations of semiarametric e ciency bounds for

More information

Characteristics of Beam-Based Flexure Modules

Characteristics of Beam-Based Flexure Modules Shorya Awtar e-mail: shorya@mit.edu Alexander H. Slocum e-mail: slocum@mit.edu Precision Engineering Research Grou, Massachusetts Institute of Technology, Cambridge, MA 039 Edi Sevincer Omega Advanced

More information

Estimation of Separable Representations in Psychophysical Experiments

Estimation of Separable Representations in Psychophysical Experiments Estimation of Searable Reresentations in Psychohysical Exeriments Michele Bernasconi (mbernasconi@eco.uninsubria.it) Christine Choirat (cchoirat@eco.uninsubria.it) Raffaello Seri (rseri@eco.uninsubria.it)

More information

Statics and dynamics: some elementary concepts

Statics and dynamics: some elementary concepts 1 Statics and dynamics: some elementary concets Dynamics is the study of the movement through time of variables such as heartbeat, temerature, secies oulation, voltage, roduction, emloyment, rices and

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analysis of Variance and Design of Exeriment-I MODULE II LECTURE -4 GENERAL LINEAR HPOTHESIS AND ANALSIS OF VARIANCE Dr. Shalabh Deartment of Mathematics and Statistics Indian Institute of Technology Kanur

More information

Semiparametric Efficiency in GMM Models with Nonclassical Measurement Error

Semiparametric Efficiency in GMM Models with Nonclassical Measurement Error Semiarametric Efficiency in GMM Models with Nonclassical Measurement Error Xiaohong Chen New York University Han Hong Duke University Alessandro Tarozzi Duke University August 2005 Abstract We study semiarametric

More information

Semiparametric Estimation of Markov Decision Processes with Continuous State Space

Semiparametric Estimation of Markov Decision Processes with Continuous State Space Semiarametric Estimation of Markov Decision Processes with Continuous State Sace Sorawoot Srisuma and Oliver Linton London School of Economics and Political Science he Suntory Centre Suntory and oyota

More information

Bias in Dynamic Panel Models under Time Series Misspeci cation

Bias in Dynamic Panel Models under Time Series Misspeci cation Bias in Dynamic Panel Models under Time Series Misseci cation Yoonseok Lee August 2 Abstract We consider within-grou estimation of higher-order autoregressive anel models with exogenous regressors and

More information

Maximum Entropy and the Stress Distribution in Soft Disk Packings Above Jamming

Maximum Entropy and the Stress Distribution in Soft Disk Packings Above Jamming Maximum Entroy and the Stress Distribution in Soft Disk Packings Above Jamming Yegang Wu and S. Teitel Deartment of Physics and Astronomy, University of ochester, ochester, New York 467, USA (Dated: August

More information

substantial literature on emirical likelihood indicating that it is widely viewed as a desirable and natural aroach to statistical inference in a vari

substantial literature on emirical likelihood indicating that it is widely viewed as a desirable and natural aroach to statistical inference in a vari Condence tubes for multile quantile lots via emirical likelihood John H.J. Einmahl Eindhoven University of Technology Ian W. McKeague Florida State University May 7, 998 Abstract The nonarametric emirical

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

arxiv: v2 [stat.me] 3 Nov 2014

arxiv: v2 [stat.me] 3 Nov 2014 onarametric Stein-tye Shrinkage Covariance Matrix Estimators in High-Dimensional Settings Anestis Touloumis Cancer Research UK Cambridge Institute University of Cambridge Cambridge CB2 0RE, U.K. Anestis.Touloumis@cruk.cam.ac.uk

More information

Testing Weak Cross-Sectional Dependence in Large Panels

Testing Weak Cross-Sectional Dependence in Large Panels esting Weak Cross-Sectional Deendence in Large Panels M. Hashem Pesaran University of Southern California, and rinity College, Cambridge January, 3 Abstract his aer considers testing the hyothesis that

More information

Elementary Analysis in Q p

Elementary Analysis in Q p Elementary Analysis in Q Hannah Hutter, May Szedlák, Phili Wirth November 17, 2011 This reort follows very closely the book of Svetlana Katok 1. 1 Sequences and Series In this section we will see some

More information

Robust Solutions to Markov Decision Problems

Robust Solutions to Markov Decision Problems Robust Solutions to Markov Decision Problems Arnab Nilim and Laurent El Ghaoui Deartment of Electrical Engineering and Comuter Sciences University of California, Berkeley, CA 94720 nilim@eecs.berkeley.edu,

More information

University of Michigan School of Public Health

University of Michigan School of Public Health University of Michigan School of Public Health The University of Michigan Deartment of Biostatistics Working Paer Series ear 003 Paer 5 Robust Likelihood-based Analysis of Multivariate Data with Missing

More information

ASYMPTOTIC RESULTS OF A HIGH DIMENSIONAL MANOVA TEST AND POWER COMPARISON WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE

ASYMPTOTIC RESULTS OF A HIGH DIMENSIONAL MANOVA TEST AND POWER COMPARISON WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE J Jaan Statist Soc Vol 34 No 2004 9 26 ASYMPTOTIC RESULTS OF A HIGH DIMENSIONAL MANOVA TEST AND POWER COMPARISON WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE Yasunori Fujikoshi*, Tetsuto Himeno

More information

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment

More information

MAKING WALD TESTS WORK FOR. Juan J. Dolado CEMFI. Casado del Alisal, Madrid. and. Helmut Lutkepohl. Humboldt Universitat zu Berlin

MAKING WALD TESTS WORK FOR. Juan J. Dolado CEMFI. Casado del Alisal, Madrid. and. Helmut Lutkepohl. Humboldt Universitat zu Berlin November 3, 1994 MAKING WALD TESTS WORK FOR COINTEGRATED VAR SYSTEMS Juan J. Dolado CEMFI Casado del Alisal, 5 28014 Madrid and Helmut Lutkeohl Humboldt Universitat zu Berlin Sandauer Strasse 1 10178 Berlin,

More information

State Estimation with ARMarkov Models

State Estimation with ARMarkov Models Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,

More information

Hotelling s Two- Sample T 2

Hotelling s Two- Sample T 2 Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test

More information

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity Bayesian Satially Varying Coefficient Models in the Presence of Collinearity David C. Wheeler 1, Catherine A. Calder 1 he Ohio State University 1 Abstract he belief that relationshis between exlanatory

More information

E cient Semiparametric Estimation of Dose-Response Functions

E cient Semiparametric Estimation of Dose-Response Functions E cient Semiarametric Estimation of Dose-Resonse Functions Matias D. Cattaneo y UC-Berkeley PRELIMIARY AD ICOMPLETE DRAFT COMMETS WELCOME Aril 7 Abstract. A large fraction of te literature on rogram evaluation

More information

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS Proceedings of DETC 03 ASME 003 Design Engineering Technical Conferences and Comuters and Information in Engineering Conference Chicago, Illinois USA, Setember -6, 003 DETC003/DAC-48760 AN EFFICIENT ALGORITHM

More information

Spectral Analysis by Stationary Time Series Modeling

Spectral Analysis by Stationary Time Series Modeling Chater 6 Sectral Analysis by Stationary Time Series Modeling Choosing a arametric model among all the existing models is by itself a difficult roblem. Generally, this is a riori information about the signal

More information

Positive Definite Uncertain Homogeneous Matrix Polynomials: Analysis and Application

Positive Definite Uncertain Homogeneous Matrix Polynomials: Analysis and Application BULGARIA ACADEMY OF SCIECES CYBEREICS AD IFORMAIO ECHOLOGIES Volume 9 o 3 Sofia 009 Positive Definite Uncertain Homogeneous Matrix Polynomials: Analysis and Alication Svetoslav Savov Institute of Information

More information

The Poisson Regression Model

The Poisson Regression Model The Poisson Regression Model The Poisson regression model aims at modeling a counting variable Y, counting the number of times that a certain event occurs during a given time eriod. We observe a samle

More information

On-Line Appendix. Matching on the Estimated Propensity Score (Abadie and Imbens, 2015)

On-Line Appendix. Matching on the Estimated Propensity Score (Abadie and Imbens, 2015) On-Line Aendix Matching on the Estimated Proensity Score Abadie and Imbens, 205 Alberto Abadie and Guido W. Imbens Current version: August 0, 205 The first art of this aendix contains additional roofs.

More information

On a Markov Game with Incomplete Information

On a Markov Game with Incomplete Information On a Markov Game with Incomlete Information Johannes Hörner, Dinah Rosenberg y, Eilon Solan z and Nicolas Vieille x{ January 24, 26 Abstract We consider an examle of a Markov game with lack of information

More information

Introduction Model secication tests are a central theme in the econometric literature. The majority of the aroaches fall into two categories. In the r

Introduction Model secication tests are a central theme in the econometric literature. The majority of the aroaches fall into two categories. In the r Reversed Score and Likelihood Ratio Tests Geert Dhaene Universiteit Gent and ORE Olivier Scaillet Universite atholique de Louvain January 2 Abstract Two extensions of a model in the resence of an alternative

More information

An Improved Generalized Estimation Procedure of Current Population Mean in Two-Occasion Successive Sampling

An Improved Generalized Estimation Procedure of Current Population Mean in Two-Occasion Successive Sampling Journal of Modern Alied Statistical Methods Volume 15 Issue Article 14 11-1-016 An Imroved Generalized Estimation Procedure of Current Poulation Mean in Two-Occasion Successive Samling G. N. Singh Indian

More information

Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Response) Logistic Regression

Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Response) Logistic Regression Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Resonse) Logistic Regression Recall general χ 2 test setu: Y 0 1 Trt 0 a b Trt 1 c d I. Basic logistic regression Previously (Handout

More information

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)

More information

Nonparametric estimation of Exact consumer surplus with endogeneity in price

Nonparametric estimation of Exact consumer surplus with endogeneity in price Nonarametric estimation of Exact consumer surlus with endogeneity in rice Anne Vanhems February 7, 2009 Abstract This aer deals with nonarametric estimation of variation of exact consumer surlus with endogenous

More information

CONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP

CONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP Submitted to the Annals of Statistics arxiv: arxiv:1706.07237 CONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP By Johannes Tewes, Dimitris N. Politis and Daniel J. Nordman Ruhr-Universität

More information

Positivity, local smoothing and Harnack inequalities for very fast diffusion equations

Positivity, local smoothing and Harnack inequalities for very fast diffusion equations Positivity, local smoothing and Harnack inequalities for very fast diffusion equations Dedicated to Luis Caffarelli for his ucoming 60 th birthday Matteo Bonforte a, b and Juan Luis Vázquez a, c Abstract

More information

A multiple testing approach to the regularisation of large sample correlation matrices

A multiple testing approach to the regularisation of large sample correlation matrices A multile testing aroach to the regularisation of large samle correlation matrices Natalia Bailey Queen Mary, University of London M. Hashem Pesaran University of Southern California, USA, and rinity College,

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Asymptotically Optimal Simulation Allocation under Dependent Sampling Asymtotically Otimal Simulation Allocation under Deendent Samling Xiaoing Xiong The Robert H. Smith School of Business, University of Maryland, College Park, MD 20742-1815, USA, xiaoingx@yahoo.com Sandee

More information

Generalized Coiflets: A New Family of Orthonormal Wavelets

Generalized Coiflets: A New Family of Orthonormal Wavelets Generalized Coiflets A New Family of Orthonormal Wavelets Dong Wei, Alan C Bovik, and Brian L Evans Laboratory for Image and Video Engineering Deartment of Electrical and Comuter Engineering The University

More information

p-adic Measures and Bernoulli Numbers

p-adic Measures and Bernoulli Numbers -Adic Measures and Bernoulli Numbers Adam Bowers Introduction The constants B k in the Taylor series exansion t e t = t k B k k! k=0 are known as the Bernoulli numbers. The first few are,, 6, 0, 30, 0,

More information

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment

More information

SUPER-GEOMETRIC CONVERGENCE OF A SPECTRAL ELEMENT METHOD FOR EIGENVALUE PROBLEMS WITH JUMP COEFFICIENTS *

SUPER-GEOMETRIC CONVERGENCE OF A SPECTRAL ELEMENT METHOD FOR EIGENVALUE PROBLEMS WITH JUMP COEFFICIENTS * Journal of Comutational Mathematics Vol.8, No.,, 48 48. htt://www.global-sci.org/jcm doi:.48/jcm.9.-m6 SUPER-GEOMETRIC CONVERGENCE OF A SPECTRAL ELEMENT METHOD FOR EIGENVALUE PROBLEMS WITH JUMP COEFFICIENTS

More information

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 mkearns@research.att.com

More information

Quantitative estimates of propagation of chaos for stochastic systems with W 1, kernels

Quantitative estimates of propagation of chaos for stochastic systems with W 1, kernels oname manuscrit o. will be inserted by the editor) Quantitative estimates of roagation of chaos for stochastic systems with W, kernels Pierre-Emmanuel Jabin Zhenfu Wang Received: date / Acceted: date Abstract

More information

1 Extremum Estimators

1 Extremum Estimators FINC 9311-21 Financial Econometrics Handout Jialin Yu 1 Extremum Estimators Let θ 0 be a vector of k 1 unknown arameters. Extremum estimators: estimators obtained by maximizing or minimizing some objective

More information

Research Article An iterative Algorithm for Hemicontractive Mappings in Banach Spaces

Research Article An iterative Algorithm for Hemicontractive Mappings in Banach Spaces Abstract and Alied Analysis Volume 2012, Article ID 264103, 11 ages doi:10.1155/2012/264103 Research Article An iterative Algorithm for Hemicontractive Maings in Banach Saces Youli Yu, 1 Zhitao Wu, 2 and

More information

Feedback-error control

Feedback-error control Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller

More information

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules. Introduction: The is widely used in industry to monitor the number of fraction nonconforming units. A nonconforming unit is

More information

Nonparametric Estimation of a Polarization Measure

Nonparametric Estimation of a Polarization Measure Nonarametric Estimation of a Polarization Measure Gordon Anderson y University of Toronto Oliver Linton z The London School of Economics Yoon-Jae Whang x Seoul National University June 0, 009 Abstract

More information

2-D Analysis for Iterative Learning Controller for Discrete-Time Systems With Variable Initial Conditions Yong FANG 1, and Tommy W. S.

2-D Analysis for Iterative Learning Controller for Discrete-Time Systems With Variable Initial Conditions Yong FANG 1, and Tommy W. S. -D Analysis for Iterative Learning Controller for Discrete-ime Systems With Variable Initial Conditions Yong FANG, and ommy W. S. Chow Abstract In this aer, an iterative learning controller alying to linear

More information

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation Paer C Exact Volume Balance Versus Exact Mass Balance in Comositional Reservoir Simulation Submitted to Comutational Geosciences, December 2005. Exact Volume Balance Versus Exact Mass Balance in Comositional

More information

ECE 534 Information Theory - Midterm 2

ECE 534 Information Theory - Midterm 2 ECE 534 Information Theory - Midterm Nov.4, 009. 3:30-4:45 in LH03. You will be given the full class time: 75 minutes. Use it wisely! Many of the roblems have short answers; try to find shortcuts. You

More information

Multiplicative group law on the folium of Descartes

Multiplicative group law on the folium of Descartes Multilicative grou law on the folium of Descartes Steluţa Pricoie and Constantin Udrişte Abstract. The folium of Descartes is still studied and understood today. Not only did it rovide for the roof of

More information

Time Series Nonparametric Regression Using Asymmetric Kernels with an Application to Estimation of Scalar Diffusion Processes

Time Series Nonparametric Regression Using Asymmetric Kernels with an Application to Estimation of Scalar Diffusion Processes ime Series Nonarametric Regression Using Asymmetric Kernels with an Alication to Estimation of Scalar Diffusion Processes Nikolay Gosodinov y Concordia University and CIREQ Masayuki Hirukawa z Northern

More information

CERIAS Tech Report The period of the Bell numbers modulo a prime by Peter Montgomery, Sangil Nahm, Samuel Wagstaff Jr Center for Education

CERIAS Tech Report The period of the Bell numbers modulo a prime by Peter Montgomery, Sangil Nahm, Samuel Wagstaff Jr Center for Education CERIAS Tech Reort 2010-01 The eriod of the Bell numbers modulo a rime by Peter Montgomery, Sangil Nahm, Samuel Wagstaff Jr Center for Education and Research Information Assurance and Security Purdue University,

More information

Asymptotic Properties of the Markov Chain Model method of finding Markov chains Generators of..

Asymptotic Properties of the Markov Chain Model method of finding Markov chains Generators of.. IOSR Journal of Mathematics (IOSR-JM) e-issn: 78-578, -ISSN: 319-765X. Volume 1, Issue 4 Ver. III (Jul. - Aug.016), PP 53-60 www.iosrournals.org Asymtotic Proerties of the Markov Chain Model method of

More information

Bootstrap Inference for Impulse Response Functions in Factor-Augmented Vector Autoregressions

Bootstrap Inference for Impulse Response Functions in Factor-Augmented Vector Autoregressions Bootstra Inference for Imulse Resonse Functions in Factor-Augmented Vector Autoregressions Yohei Yamamoto y University of Alberta, School of Business February 2010 Abstract his aer investigates standard

More information

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material Robustness of classifiers to uniform l and Gaussian noise Sulementary material Jean-Yves Franceschi Ecole Normale Suérieure de Lyon LIP UMR 5668 Omar Fawzi Ecole Normale Suérieure de Lyon LIP UMR 5668

More information

Statistics II Logistic Regression. So far... Two-way repeated measures ANOVA: an example. RM-ANOVA example: the data after log transform

Statistics II Logistic Regression. So far... Two-way repeated measures ANOVA: an example. RM-ANOVA example: the data after log transform Statistics II Logistic Regression Çağrı Çöltekin Exam date & time: June 21, 10:00 13:00 (The same day/time lanned at the beginning of the semester) University of Groningen, Det of Information Science May

More information

CHAPTER 2: SMOOTH MAPS. 1. Introduction In this chapter we introduce smooth maps between manifolds, and some important

CHAPTER 2: SMOOTH MAPS. 1. Introduction In this chapter we introduce smooth maps between manifolds, and some important CHAPTER 2: SMOOTH MAPS DAVID GLICKENSTEIN 1. Introduction In this chater we introduce smooth mas between manifolds, and some imortant concets. De nition 1. A function f : M! R k is a smooth function if

More information

Cambridge-INET Institute

Cambridge-INET Institute Faculty of Economics Cambridge-INET Institute Cambridge-INET Working Paer Series No: 4/3 Cambridge Working Paer in Economics: 45 THE CROSS-QUANTILOGRAM: MEASURING QUANTILE DEPENDENCE AND TESTING DIRECTIONAL

More information