ABSTRACT We use the notion of locally ancillary estimating functions to develop a quasiscore method for fitting regression models containing measureme

Size: px

Start display at page:

Download "ABSTRACT We use the notion of locally ancillary estimating functions to develop a quasiscore method for fitting regression models containing measureme"

Maud Green
5 years ago
Views:

1 Locally Ancillary Quasiscore Models for Errors-in-Covariates Paul J. RATHOUZ and Kung-Yee LIANG Paul J. Rathouz is Assistant Professor, Department of Health Studies, University of Chicago, Chicago, IL 6637 ( and Kung-Yee Liang is Professor, Department of Biostatistics, Johns Hopkins University, Baltimore, MD 15 ( The authors thank Theodore G. Karrison, an Associate Editor and two referees for very constructive discussions, comments and suggestions; Wanli Min for computing assistance; and Rolf Loeber for permission to use the cortisol data. Corresponding address: Paul Rathouz Dept. of Health Studies 5841 S. Maryland Ave., MC 7 Chicago, IL 6637 Ph: FAX: prathouz@health.bsd.uchicago.edu February 1 Revised for Journal of the American Statistical Association 1

2 ABSTRACT We use the notion of locally ancillary estimating functions to develop a quasiscore method for fitting regression models containing measurement error in the covariates. Suppose interest is on the model E(Y ju; w) for response Y, the observed data are (y; ; w), and X is a mismeasured surrogate for u. We take a functional modelling approach, treating the u as a fied nuisance parameter. Beginning with quasiscores for the regression parameter and the unknown u, a bias-corrected quasiscore for the regression parameter is derived that is second order locally ancillary for the nuisance u. The method used to accomplish this requires only the correct specification of the mean and variance functions for Y and X in terms of u, w and the regression parameter. When an estimator for u is plugged into the corrected quasiscore, local approimations show that the bias is small. Simulations verifying this result and an eample from child psychiatry are presented, both using log-linear regression models. KEY WORDS: ancillarity, measurement error, nuisance parameter, quasilikelihood, semiparametric model. 1

3 1 Introduction Let (u i ;w i ) be a sequence of covariates with arbitrary joint empirical distribution function G( ), and let (y i ; i ;w i ) be a sequence of observations such that the random variables (Y i ju i ;w i ) are independent conditional on the vector of (u i ;w i )s. Assume that y i, u i and i are scalars and that w i is a vector of dimension q. Interest is on the regression model E(Y i j u i ;w i ) = μ y (fi; u i ;w i )=μ yi var(y i j u i ;w i ) = ffi y ~v y (fi; u i ;w i )=v yi ; ) ; (1) where fi is a p-dimensional regression parameter and ffi y is a dispersion parameter. Often, p = q +1. Let i be a mismeasured version of u i such that E(X i j u i ;w i ) = μ (ff; u i ;w i )=μ i var(x i j u i ;w i ) = ffi ~v (ff; u i ;w i )=v i ; ) : () Again, ffi is a dispersion parameter and ff is a measurement error parameter. We make the common surrogacy assumption that X i is independent of the response Y i, conditional on the covariates (u i ;w i ). In this paper, we propose a new method for inference in the regression model (1), subject to stochastic measurement error in the covariates following model (). The method etends in an approimate way the functional modelling approach of Stefanski and Carroll (1987) in which the mismeasured covariate u i is viewed as a fied nuisance parameter. Under their generalized linear models (McCullagh and Nelder 1989) framework, the conditional score function for fi (Lindsay 198) is unbiased even when the covariate u i is not known eactly, but rather is estimated. This elegant method is semiparametric efficient in the sense that the conditional score is optimal for fi

4 in the absence of knowledge of u i or of the distribution of (u i jw i ) (Lindsay 198, 1985). However, the class of applicable models for the distributions of (Y i ju i ;w i ) and (X i ju i ;w i ) is limited to the canonical eponential family. In separate research, Waterman and Lindsay (1996a,b) proposed a projected score method that approimates the conditional score when it eists and emulates it in terms of robustness to nuisance parameters when it does not. Robustness is operationalized in terms of local ancillarity (Small and McLeish 1994), which we define in the net section. While the Waterman-Lindsay method generates estimating functions that are locally ancillary to an arbitrary order, their work and that of Small and McLeish (1989) has also shown that second-order local ancillarity is a particularly important special case. Recently, Rathouz and Liang (1999) have etended the Waterman-Lindsay projected score method to a quasilikelihood setting, thereby obtaining a secondorder locally ancillary quasiscore (SOLAQS). The measurement error method proposed here is motivated by making three observations, which synthesize these prior works: (i) the Stefanski and Carroll (1987) method for measurement error problems corresponds to the conditional score method for general nuisance parameter problems; (ii) recent work shows that second-order locally ancillary estimating functions provide very good approimations to the behavior of the conditional score; and (iii) the general method of obtaining second order locally ancillary estimating functions from quasilikelihood models, of which (1) and () are one eample, can be eploited to develop a new method for inference in functional measurement error models. Such development is the object of this paper. 3

5 Model (1), which is the inferential target, includes linear, logistic, log-linear and polynomial regression models as special cases. Model () encompasses the classical measurement error model E(X i j u i ;w i ) = u i, var(x i j u i ;w i ) = ffi, as well as the error calibration model E(X i j u i ;w i ) = ff + ff 1 u i + ff w i, var(x i j u i ;w i ) = ffi (Carroll, Ruppert and Stefanski 1995) as special cases. Additionally, in model (), the mean and variance of i do not have to be specified on the same scale in which u i appears in model (1). We could for eample, allow amultiplicative error model Eflog(X i ) j u i ;w i g = ff + ff 1 log(u i )+ff w i and varflog(x i ) j u i ;w i g = ffi. Whatever the measurement error model, we assume throughout that the required internal or eternal replication or validation data (Carroll and Stefanski, 199) are available to provide consistent estimators of the measurement error parameters (ff; ffi ). Besides the aforementioned conditional score method, other related methods include those of small measurement error asymptotics which provide firstorder bias-corrections to the naive estimator (e.g. Stefanski 1985), and the approimate quasilikelihood-variance function (QVF) method of Carroll and Stefanski (199), in which parametric functions are assumed only for the first two moments of (Y i ju i ;w i ) and (X i ju i ;w i ). Our approach is in the spirit of the conditional score, while making the weaker assumptions equivalent tothe QVF models. We obtain a bias-corrected quasiscore function for which, using the method of Stefanski (1985), the resulting estimator would have zero firstorder bias-correction. Our approach is therefore semiparametric in two senses. First, it is a functional model in that it does not require a specification of the marginal distribution of (u i jw i ), as the u i 's are treated as fied nuisance 4

6 parameters. Second, it only requires specification of the first two moments of the distributions of (Y i ju i ;w i ) and (X i ju i ;w i ). For a unified presentation of methods for errors-in-covariates, see Carroll, et al. (1995). This paper has the following organization. In Section, we review secondorder locally ancillary estimating functions and show that when the ancillarity applies to the mismeasured covariate u i, the bias-correction of Stefanski (1985) obtains automatically". Section 3 contains the main development of the SO- LAQS for measurement error problems. Theoretical and practical considerations for use of the SOLAQS for inferences on fi are presented in Section 4. These include variance estimation and the use of small measurement error asymptotics to eamine the behavior of the resulting SOLAQS with respect to bias. In Section 5, we study the log-linear regression model in more detail in order to illustrate some advantages of the SOLAQS over other methods. We include simulation results and a small eample data analysis from child psychiatry. We close with a brief discussion in Section 6. Locally ancillary estimating functions in measurement error problems Suppose that the regression parameter fi is a vector of dimension p and that the (p 1) estimating function g(fi) = nx i=1 g i (fi; y i ;u i ;w i )= X i g i ; (3) such that Efg i (fi; Y i ;u i ;w i ); u i ;w i g =,is available for inferences on fi. Now, treating u i as an unknown nuisance parameter and operating only on the ith 5

7 summand in (3), define the (p 1) functional operators " k b i k(g i Efg i(fi; Y Λk i ;u i ;w i ); u Λ ;w i g u Λ =u i ; (4) for k = 1;, and the (p ) concatenated functional operator b i () (g i) = fb i 1(g i );b i (g i )g. The idea in (4) is that the epectation is taken conditionally on (u Λ ;fi), where u Λ 6= u i. If b i 1(g i ) =, then g i is said to be first-order locally ancillary" for u i, while if b i () (g i)=,g i is second-order locally ancillary" for u i (Small and McLeish 1994). One interpretation of kth-order local ancillarity is that, under regularity, it is equivalent to Efg i (u i ); u Λ g = of(u Λ u i ) k g. In addition, under standard regularity conditions for estimating functions (Godambe, 196; Godambe and Thompson 1974), g = b i 1(g i ) and E i = b i (g i 1(g i i ; (5) (Rathouz and Liang 1999). Higher orders of local ancillarity are defined similarly, the order providing a measure of the degree of robustness of g i to u i. The second order is most important in practice, however, because it provides a large degree of the bias correction obtained through second and successively higher orders (Waterman and Lindsay 1996a,b; Small and McLeish 1989). This bias correction phenomenon arises in the measurement error literature, as follows. Let u i be measured with error by i, where E(X i ju i ;w i ) = u i and var(x i ju i ;w i ) = ffi ~v (u i ;w i ). Consider the plug-in estimating function ^g i (fi) =g i (fi; y i ; i ;w i ), and let ^fi be the solution to P i ^g i =. Then for fied ffi >, ^fi converges in probability to fi + O(ffi ) as n 1. The remainder O(ffi ) refers to the measurement error bias in the limiting value of ^fi as ffi. The order of operations for this argument is that n 1first, then ffi. 6

8 A first-order bias-corrected estimator of fi using such small measurement error asymptotics would be ^fi c = ^fi + 1 ffi 8 < : lim n1 X T9 = 1 " X ; lim n1 i g i i )# ~v (u i ;w i ) ; (6) where for fied ffi >, ^fi c converges to fi+o(ffi )asn 1(Stefanski 1985). If g i is second-order locally ancillary for u i, then by (5), E f(@ g i =@u i )~v (u i ;w i )g =, and the bias correction given by (6) obtains automatically. Therefore, estimation with a second-order locally ancillary estimating function produces an estimator that is approimately consistent to order o(ffi ). In the net section, we propose a quasiscore method for obtaining second-order locally ancillary estimating functions for fi in the functional measurement error modelling problem, eliminating the need for any correction term such as that in (6). 3 SOLAQS for Measurement Error Models Rathouz and Liang (1999) recently proposed a method for constructing secondorder locally ancillary estimating functions. The method differs from those previously proposed in that it does not rely on projection, and thereby avoids the need to specify a likelihood function. In this section, their idea is applied to the problem of inference in functional measurement error models. Assume that models (1) and () hold, that μ yi, v yi, μ i, v i are finite and admit continuous first and second derivatives with respect to (fi;ff;u i ) and that v yi > and v i > for all (fi;ff;u i ). Additionally assume that μ (u i ;w i ) is strictly monotone (in u i ) in a neighborhood of the true u i. These assumptions are not restrictive; for many models in practical usage, moments and their derivatives to several 7

9 orders eist. The monotonicity assumption is quite reasonable, given that i is a mismeasured surrogate for the true u i. For the ith observation, we now construct estimating functions for fi, ffi y and u i which will act as building blocks in the development that follows. First, if u i were known without error, one might consider the quasiscore S = X i S i = X T yi yi μ v yi for consistent inferences on fi (Wedderburn 1974, McCullagh 1983). In the traditional generalized linear models case (McCullagh and Nelder 1989), h y ( ) is a link function, yi is the linear predictor and V ( ) is the variance function. Then h y (μ yi )= yi = fi T w i + fi 1 u i, and S i takes the well-known form S i = wi u i fh y(μ yi )g y 1 i μ yi ffi y V (μ yi ) : Additionally, ifffi y is not known, the estimating function R = X i R i = X i (ffi y v yi ) 1 f(y i μ yi ) ffi y v yi g would yield consistent inferences on ffi y. R = can be solved after fi has been estimated via S =,since S is ancillary for ffi y. Note that (S T ;R ) T is the quasilikelihood analogue to the score equations given in Stefanski and Carroll (1987), from which the conditional score function was constructed. Now, similarly to S i, using the data y i and i, define the u i -quasiscore T i yi μ yi v i i μ i v i = T i1y + T i1 : (7) In the aforementioned case where h y (μ yi )= yi, and where μ i = u i, T i1 = fi 1 fh y(μ yi )g y 1 i μ yi ffi y V (μ yi ) + i u i : v i 8

10 The quasiscore T i1 will be used to estimate the mismeasured covariate u i. Further, it will be used as a basis for correcting the bias in S i due to measurement error. To that end, note that T i1 is optimal for u i in the class of linear estimating functions and is thereby information unbiased (Crowder, 1987). In particular, T i i1 =@u i + T i1 is a second unbiased estimating function for u i. Indeed, T i is the quasiscore analogue of the second Bhattacharyya score for the nuisance u i (Rathouz and Liang 1999). Letting prime ( ) denote differentiation with respect to u i, T i can be re-epressed as where T iy = T i1y + T T iy = T i = T i1 + T i1 = T iy + T i +T i1y T v 1 i with an analogous form for T i. i1y and similarly for T i. From (7), T iy takes the form (y i μ yi n o v yi (yi μ yi) v yi i We now obtain a second order locally ancillary quasiscore, S i, as a linear combination of S i, T i1 and T i. Operating on the ith observation (y i ; i ;w i ), define the matri 6 D i = 4E 8 S i T i1 T i 19 C >= A >; ;bi () S i T i1 T i 13 C A 7 5 = D i D i1 D i D i1 D i11 D i1 D i D i1 D i where the second and third columns of D i are the maps of (S T i;t i1 ;T i ) T via b i 1( ) and b i ( ) respectively. Interestingly, D i is symmetric. Then define S i = S i a i Ti1 T i ; where a i is the (p ) matri given by a i = b i ()(S ) ( ) 1 1 b i Ti1 Di11 D () =(D T i1 D i ) i1 : i D i1 D i 9 1 C A ;

11 In order to compute S i, we must evaluate the functionals b i (S () i), b i (T () i1) and b i (T () i). This is easily accomplished via the derivatives of the mean and the variance models (1) and () (see Appendi A). We claim that S i is second order locally ancillary for u i. To see this, first note that b i ( ) is a linear operator in the sense that () bi (a () 1g 1 + a g ) = a 1 b i () (g 1) + a b i () (g ), where a k = a k (fi;u i ;w i ) does not contain the data (y i ; i ), k =1;. Then write b i ()(S i ) = b i ()(S i ) a i b i () Ti1 T i 1 Di11 D = (D i1 ;D i ) (D i1 ;D i ) i1 Di11 D i1 =: D i1 D i D i1 D i We refer to S i as a second-order locally ancillary quasiscore (SOLAQS). It is a robust version of S i, compensating for the bias introduced by the measurement error in i. In contrast to previous methods (Waterman and Lindsay 1996a), S i is obtained without the use of projection, and consequently depends only upon correct specification of models (1) and (). Summing over all observations, inferences on fi can be based on the SOLAQS S = X i S i = X i ( S i a i Ti1 T i ) ; which is an unbiased estimating function. However, while S is second-order locally ancillary for the vector (u 1 ;:::;u n ) T, the u i 's still appear in S i and therefore must be estimated. This is accomplished for each i by solving T i1 (fi;u i )=inu i for ^u ifi, giving rise to the plug-in quasiscore ^S (fi) = X i ^S i (fi) = X i S i (fi; ^u ifi ); 1

12 which is used for inferences on fi. In Section 4, we eamine theoretical implications of and practical considerations for using ^S for inferences on fi. In the case where ffi y is unknown, an analogous procedure to that for deriving S i is implemented. Substituting R i for S i and ffi y for fi, a linear combination R i of (R i ;T i1 ;T i ) that is a second-order locally u i -ancillary estimating function for ffi y -inferences is obtained. Estimation of (fi T ;ffi y ) T is accomplished via solution of ^S = and ^R = P i ^R i =. Simultaneous solution of ( ^S T ; ^R ) T = is required, however, as S is not ancillary for ffi y. 4 Inferences with the plug-in SOLAQS ^S 4.1 Introduction In this section, we further develop the use of ^S for inferences on the regression parameter fi. In the following subsection, we consider the bias in ^S i as the measurement error variance ffi. Having shown that the bias is small, in Section 4.3, we consider the asymptotic distribution of ^fi as n 1for fied ffi, where ^fi is the solution to ^S =. Finally, we address some computational issues in solving ^S =. The Fisher scoring computational technique for solving ^S = is sketched in Appendi C. 4. Small measurement error asymptotic bias in ^S Here, we study the behavior of ^S i under small measurement error asymptotics (Carroll and Stefanski 199). Formally,we hold n fied and consider a series of eperiments in which the measurement error dispersion ffi. That ffi need not reflect a true limiting operation in practice. Rather, since eact bias 11

13 analysis can be quite difficult, it serves as an analytic tool, yielding order-ofmagnitude approimations that provide some insight to the performance of the method with respect to bias-correction. We establish that the asymptotic bias in ^S i is of smaller order than that of the naive plug-in score ^S i wherein i replaces u i. We suppress the subscript i, and operate on one observation at a time. A proof is in Appendi B. Letting prime denote differentiation with respect to u, we have ^S S =(^u fi u)s + 1 (^u fi u) S (^u fi u) 3 S (u Λ fi) (8) where ju Λ fi uj»j^u fi uj. We show in Appendi B that (^u fi u) =D 1 11 T 1 + O p (ffi ), so that the first term of (8) can be written n o (^u fi u)s = ST 1 D S (^ufi u) D 1 11 T 1 : (9) Regarding (8) and (9), we note here two interesting facts on which proof of the following result is founded, and which are direct results of the construction of S. First, E(S ) = E(S ) = due to second-order local ancillarity. Second, due to the joint optimality ofthe quasiscores (S ;T 1 ) for (fi;u), E(S T 1 ) ß, i.e., S and T 1 are approimately orthogonal. Theorem 1. Let ^u fi be the solution of T 1 = for fied fi. Then for the true fi, ^S = S (^u fi )=S + Z + O p (ffi ), where Z = O p (ffi 1= ) and is unbiased. Furthermore, under uniform integrability, E( ^S S )=O(ffi 3= ). We note as a point of comparison that using to estimate u, (^S S ) is also O p (ffi 1= ), but with bias of order O(ffi ). 1

14 4.3 Asymptotic distribution of ^fi We now eamine the asymptotic behavior of ^fi for fied measurement error variance ffi. The following discussion applies equally to the parameter (fi T ;ffi y ) T when ffi y is being estimated simultaneously with fi; simply replace ^S i throughout with the vector ( ^S T i; ^R i ) T. By the standard theory of estimating functions (e.g. Carroll, et al. 1995, Appendi A.3), there eists ^fi, a sequence of solutions to ^S = such that ^fi fi Λ in probability, where fi Λ is the solution to the limiting equation lim n1(1=n) ^S = that is closest to the true fi. By Theorem 1, fi Λ is close to the true fi; this result is similar to those of other methods (e.g. Stefanski 1985; Carroll and Stefanski 199) in that the bias is not completely eliminated, ecept in very specialized cases. Of course, there is no guarantee that there is a unique solution to ^S =, even as n 1, so ^fi must be carefully defined in practice. Our approach to this problem is given in the net section, but a general solution may not eist without further assumptions. By estimating function theory, p n( ^fi fi Λ ) d Nf;A 1 B(A 1 ) T g, where nx A = n 1 n1 lim ^S nx and B = n 1 n1 E ^S i ^S i T ; i=1 and all quantities are evaluated at fi Λ. The variance factor B can be consistently estimated by replacing fi Λ with ^fi and using the empirical epected value of ^S i ^S T i. To estimate A, we employ anumerical derivative matri, as follows. At the estimated ^fi, the kth column of (@ ^S =@fi) is estimated by ^S ( ^fi + D k ) ^S ( ^fi D k ) d k ; where d k is a perturbation, and D k = (;:::;;d k ; ;:::;) T is a p-vector of 13 i=1

15 zeros, with d k in the kth position. Wald-type confidence intervals for fi Λ can then be constructed from ^A 1 ^B( ^A 1 ) T in the standard fashion. In the case where the measurement error parameters = (ff T ;ffi ) T are estimated as well by internal replication and/or validation data (Carroll, et al. 1995), a modified standard error estimator applies. Assume that is estimated via solution to some estimating equation U = P i U i =. Assume further than U i does not depend on fi. This is not unreasonable, since in most settings U i will be a function of ( i ;u i ;w i ), but not y i. Define the quantities nx C = n 1 lim ^S nx and D = n 1 n1 i=1 Then, we show in Appendi D that p n( ^fi fi Λ ) d Nf;A 1 B Λ (A 1 ) T g, where B Λ = n 1 lim nx n1 i=1 i=1 Ef( ^S i CD 1 U i )( ^S i CD 1 U i ) T g; and all quantities are evaluated at (fi Λ T ;ffi y ; T ) T. As with A, C can be estimated using a numerical derivative, D can be estimated in the usual way from U, and B Λ can be estimated using the empirical variance of ^S i CD 1 U i. When is estimated using eternal replication and/or validation data (Carroll, et al. 1995), an augmented data set consisting of the concatenation of the primary data and the eternal data is employed. By setting U i = for the primary data and S i = for the eternal data, the augmented data can be analyzed as for internal replication and/or validation data. 4.4 Computational issues Our eperience thus far with the proposed method has suggested three computational techniques that provide for numerical stabilization of the estimation 14 :

16 of fi. First, before solving ^S, we transform the design matri containing the vectors w i to form an orthonormal basis. Also, using ffi and the sample variance of, we transform such that the empirical distribution function of u has mean zero and variance one. These transformations have the additional advantage of permitting the perturbations d k to be set to a fied constant for all k and sample sizes. We use d k =:1. Second, for smaller sample sizes, there is some instability in the simultaneous solution of S ;T 11 ;:::;T n1. This can be largely alleviated by estimating u i as the solution to T Λ i1;n = for each i, where T Λ i1;n is the same as T i1, replacing ffi by a smaller quantity, ffi Λ n, thereby weighting the estimate ^u ifi towards i. We used ffi Λ n =(1 1p=n)ffi in our simulation work. Third, the uniform integrability assumption made at several points in the proof of Theorem 1 has implications for the estimation of u i. In some settings, it may be necessary to bound the permissible values of ^u ifi by quantities that are scientifically reasonable for the application at hand. As such bounds can be relatively wide, we see no way in which the need to specify them would restrict the applicability of the proposed method. Additionally, two issues arise when ffi is large: lack of convergence and multiple solutions. To choose among possible multiple solutions, we take the naive estimator of (fi T ;ffi y ) T as a starting value for the Fisher scoring procedure (Appendi C). Lack of convergence can occur if the algorithm diverges to a point with one or more singular matrices. Alternatively, a solution may not be reached after the maimum number of iterations (we use 1). Interestingly, in our eperience with simulated data, these problems are more frequent when 15

17 ffi y is known rather than estimated simultaneously with fi. However, when ffi y is being estimated and ffi is large (e.g., greater than var(u i )), multiple solutions to ^S = appear to eist, and the one obtained by starting from the naive estimator may not be consistent for fi Λ. 5 Eample: Log-linear regression We now illustrate the applicability and performance of the SOLAQS, by comparing it to other methods in the contet of a special case of models (1) and (), the log-linear regression model (McCullagh & Nelder 1989, Ch. 6) log(μ yi )=fi T w i + fi 1 u i and v yi = ffi y μ yi (1) with additive measurement error i = u i + ffi i, where E(ffi i ju i )= and var(ffi i ju i )=ffi : (11) We consider assumptions underlying other approaches and present simulation results comparing the SOLAQS to these competitors. 5.1 Other approaches Were (Y i jw i ;u i ) a true Poisson random variable and ffi i ο N(;ffi ), the conditional score method of Stefanski and Carroll (1987) would apply and give rise to the semiparametric efficient estimator for fi in the presence of unknown distribution for (U i jw i ). However, implementation is hindered by two concerns. First, the conditional score does not take a closed form; rather, iterative computations are required to compute it. Second, it is not known to what degree the conditional score is robust to misspecifications of the distributional form 16

18 of (Y i ;X i ju i ;w i ) (Carroll and Wand, 1991). We note that the first problem can be addressed approimately via projection (Waterman and Lindsay 1996a,b). Indeed, due to the equivalence of quasiscores and likelihood scores in eponential family distributions, S with ffi y = 1 is the second-order projected score for the Poisson-Gaussian model. Furthermore, in many problems, (Y i ju i ;w i )isoverdispersed relative to a true Poisson random variable, rendering a likelihood difficult to specify. Alternatives not requiring a likelihood include regression calibration (RC; Carroll, et al. 1995, Ch. 3) and the SIMEX estimator (Cook and Stefanski, 1995). Treating u i as a random variable, the RC method replaces u i with an estimate of E(U i jw i ; i ), thereby eploiting a distributional assumption on (U i jw i ), which mayormay not be valid. However, if the distribution of (U i jw i ; i ) is Gaussian, then in the log-linear model, the RC method is particularly applicable and relatively statistically efficient for ^fi 1. The fully functional SIMEX estimator is easy to implement and quite general. It does however require a distributional form for (X i ju i ;w i ). In contrast, the SOLAQS method only requires the mean and variance of (X i ju i ;w i ). 5. Simulation study We now compare the empirical performance of SOLAQS to that of RC in a simulation study of models (1) and (11). In each simulation, we compare the naive estimator (the solution to ^S, with u i replaced by i ), the RC estimator with the linear approimation calibration estimator as described in Carroll, et al (1995, Section 3.4.), and the SOLAQS estimator using ( ^S T ; ^R ) T to esti- 17

19 mate fi and ffi y. We include the naive estimator to provide an indication of the degree of bias correction needed. The RC method, in cases where its assumptions are satisfied, will permit an assessment of the efficiency loss in SOLAQS by not eploiting the distribution of (U i jw i ). Computational procedures in Section 4.3, 4.4 and Appendi C were used for estimation and confidence interval construction with ( ^S T ; ^R ) T. The error variance ffi, which in practice is easily-estimated with replication data, was assumed known for both the RC and the SOLAQS methods. The RC model parameters of the distribution of (u i ;w i ), which are not needed for SOLAQS, were estimated for each replicate. We concentrate on the coefficient fi 1 of u i. Overdispersed count data Y i were generated as a (3 : 7) Bernoulli miture of two Poisson random variables such that the mean and variance were μ yi and ffi y μ yi, respectively. The u i 's were standardized to have mean zero and variance one and the set of (w i ;u i )'s was fied over all replicates. The errors ffi i were mean-zero Gaussian with variance ffi. We considered relative rate values ep(fi 1 ) = (1:5; 3:), measurement error variance ffi = (:3; :7), and overdispersion ffi y =(1:5; 3:), Each simulation contains 5 replicates. Model 1. Let w i = 1 so that fi is the intercept. Set fi =. Let the u i 's be a random Gaussian sample. Set sample size n =. Results are in Table 1. Model is identical to Model 1, ecept that the u i 's are uniformly distributed, then standardized. Results are also in Table 1. 18

20 Model 3. Let w Λ i1 be Bernoulli with probability :3. To generate the u i s, let ffl i be uniformly distributed on (; 1). Then let u i be the standardized version of (1 + cw Λ i )ffl i, c>. Let w i =(1;w i1 ) T, where w i1 is the standardized version of w Λ i1. Fi fi = (; log(1:5)) T and n = 5. Setting c = (:5; 1:5) allows for different values of ρ wu = corr(w i1 ;u i ). Results are in Table. Results. Of siteen thousand replicates across the three models, in all but one replicate, the SOLAQS converged in less than 1 iterations (result included), while 1 others took more than 5 iterations. These instances all fell under the last case of Model (Table 1). The bias in the naive estimator reflects a substantial degree of measurement error in all cases. For Model 1, the assumptions of RC are met, and therefore, by eploiting the distribution of (U i jw i ), the RC method surpassed the SOLAQS in terms of bias and efficiency. With the eception of one case, however, use of the SOLAQS resulted in less than percent precision loss, as measured by the MSE. For Models and 3, where the RC assumptions are violated, SOLAQS yielded lower bias than the RC method in 19 of 4 cases. While the RC was notably biased in some cases, in no case was the bias in SOLAQS more than ten percent, and it was usually less than five percent. Compared to RC, the SOLAQS never resulted in more than 3 percent loss in precision. By contrast, the gains in precision in SOLAQS over RCwere at times substantial. SOLAQS coverage probabilities of Wald-type confidence intervals using the sandwich estimator were satisfactory for fi 1 = log(1:5), but were anti-conservative at fi 1 = log(3:). This suggests that, in practice, for larger values of fi 1, an- 19

21 other method of variance estimation, such as the BCa bootstrap (Efron and Tibshirani, 1993, Section 14.3) as suggested by Carroll, et al (1995, Sections A.6.5 A.6.6) would be more appropriate. 5.3 Cortisol data We illustrate our method with a data set from a study eamining the relationship between salivary cortisol and symptoms of conduct disorder (CD) (McBurnett, Lahey, Rathouz and Loeber ). One hypothesis about the psychopathology of CD is that symptomatic behaviors occur with increased frequency due to subjects' suppressed fear response to threatening stimuli, such as punishment for disruptive behaviors. Because fear response is reflected in cortisol levels, we epect symptoms to be inversely related to cortisol. This hypothesis was eamined in a clinic-referred sample of n = 38boys with CD. Responses Y i are the cumulative counts over four years of reported symptoms of aggressive CD and of covert CD. We treat the two sets of symptoms separately. Symptom counts each ranged from to 13, with a median of. The 75th percentiles were 4 aggressive symptoms and 5 covert symptoms. The covariate u i of interest is the logarithm of salivary cortisol in Year of the study. Since cortisol was measured in Years and 4, we let X ij = log(measured cortisol), where j denotes Year. Note that corr( i ; i4 ) = :, suggesting substantial within-subject variation, in this case reflecting laboratory error and temporal fluctuations. Age in the first study year was also obtained, presumably without error. Let w i =(1; age i ) T. We model the error in the log-cortisol measurements as E(X ij j u i )=u i +

22 ffi(j = 4) and var(x ij j u i ) = ffi ~. Consequently, define X i = (X i + X i4 ff)=, so that E(X i j u i ) = u i and var(x i j u i ) = ffi ~ = = ffi. Rescaling X i so that sd(u i ) ß 1, we first estimated the error model as ^ff = μ 4 μ = :84 and ^ffi = :5 times the sample variance of ( i i4 + ^ff) = 1:99. We then estimated fi and ffi y simultaneously using the naive and SOLAQS methods (Table 3); the correction ffi Λ n (Section 4.4) was not used. Due to the small sample size and to account for variability in estimation of ff and ffi, bootstrap BCa confidence intervals were generated instead of Wald-type confidence intervals. As epected, correcting for the measurement error makes moderate difference in the parameter estimates for the age coefficients, while use of ^S provides a correction for attenuation of about 7 percentin^fi 1 for each outcome. Nevertheless, the effect is not significant for the Covert CD outcome. Confidence intervals are considerably wider for ^S, reflecting the variability induced in correcting for the bias due to measurement error. Furthermore, the results suggest that, due to the error in u i, the dispersion ffi y is substantially overestimated in the naive analysis. 6 Concluding Remarks We have outlined a quasilikelihood-based method for obtaining second order locally ancillary estimating functions for regression problems subject to errors in covariates. Prior to Rathouz and Liang (1999), local ancillarity had been achieved using the method of L projection (Waterman and Lindsay 1996a), generally requiring a likelihood specification. By replacing the projection operator proposed by Waterman and Lindsay with the solution to a simple linear 1

23 system, our method achieves local ancillarity assuming only that the first two moments of the response and the surrogate covariate are correctly specified. Furthermore, through a functional modeling approach, we avoid any assumptions on the baseline distribution of u. Simulations in the contet of log-linear regression show that the SOLAQS estimator is generally less biased than the RC estimator when the RC assumptions were violated, and often results in a substantial increase in precision. Corresponding results in larger sample sizes are epected to be more dramatic due to the more important role of bias. In the following, we briefly remark on a few additional aspects of the method. Asymptotic approimations focusing on the limiting behavior of ^S i as ffi 1 were discussed in Section 4.. Of more direct interest is the estimator ^fi that solves ^S = ; such approimations are considerably more difficult to study. Nevertheless, the work of Stefanski (1985) described in Section (equation 6) and its relationship to second-order local ancillarity suggests the following: For fied ffi, as n 1, ^fi converges to fi Λ which differs from fi by a quantity of order o(ffi ). Also, as with some eisting estimating functions for measurement error (e.g., Stefanski and Carroll, 1987), ^S may not admit a unique root even in its limit as n 1. This may be especially true for larger values of ffi. Given that ^Si S i as ffi, this is not epected to pose practical problems for small measurement error variance. There are several reasons to believe that ^S is reasonably efficient. First, it is based on quasiscores for fi and u, which are the efficient estimating functions that are linear in the data (Crowder 1987). Second, the conditional score function gives rise to the semiparametric efficient score for fi when the

24 distribution of (ujw) is unspecified. Third, the second-order locally ancillary projected score of Waterman and Lindsay (1996a) emulates the conditional score in terms of bias and efficiency when it eists. Finally, our method is the quasiscore analogue of the projected score. In depth efficiency studies are a subject for further research. Finally, etension of the SOLAQS to multiple mismeasured covariates is straightforward in the case where the variance-covariance of (X i ju i ;w i ) is reliably estimable. If the measurement errors for the components of u i are independent, or if the errors are additive, this will generally not pose a problem. APPENDIX A: Components of matri D Straightforward calculations using derivatives, epected values and the definition of the operators b k ( ), k = 1; lead to the following epressions hold for the components of D i, where i refers to the observation. For details, see Rathouz and Liang (1999) and the technical report referred to T yi D i = v yi = D D i1 = D i = D i11 = D i1 @u v 1 yi v 1 yi v 1 yi v 1 D i = D iy + D i +D i11; = D T μ yi = D + v 1 μ + v i i μ i = D i1 3

25 where D iy v v i + and D i is defined μ i yi v μ yi In the case where ffi y is estimated via ^R simultaneously with fi, the following additional components of D i are required. D i1r =, D ir = and D ir =(ffi y v yi ) T D ir1 =(ffi y v yi i D ir =(ffi y v yi i D ir =(ffi y v yi ) 1 8 < v i + and D i 9 = i ; APPENDIX B: Sketch Proof of Theorem 1 To prove Theorem 1, we study (8) and (9) in four steps. In Step 1 (Lemmas and 3), we eamine the distribution of ^u fi via decomposition into terms of different orders. Step involves deriving epressions for the first term in (9), from which the stochastic order and bias are determined. To accomplish this, we show S = ~S + O p (ffi ) (Lemma 5). Then, via important information equalities (Lemma 6), Corollary 7 establishes that ~S is orthogonal to T 1. In Steps 3 and 4, we derive epressions for the second term in (9) (Lemma 8) and the last two terms of (8) (Lemma 9) respectively, from which stochastic order and bias are determined. Throughout, we hold fi and w fied at the true 4

26 values and take the model assumptions in Section 3 as given. More detailed proofs are in a technical report available from the first author. p Lemma. ^u fi u as ffi, and (^u u) =O p (ffi 1= ). Proof. Let u be the true value of u. For u R, ffi T 1 (u) = ffi T 1y (u) + ffi T 1 (u) =ffi O p (1)+(@μ i =@u i )(u; w)~v (u; w) 1 f μ (u; w)g. Since varfx μ (u ;w)g = ffi ~v (u ;w), X L μ (u ;w), which implies that X p μ (u ;w). Also, ffi O p (1) p, so ffi T 1 (u) p T Λ 1 (u), where T Λ 1 (u) = (@μ i =@u i )(u; w)~v (u; w) 1 fμ (u ;w) μ (u; w)g. By monotonicity of μ (u), T Λ 1 (u) > ifu<u, and T Λ 1 (u) < ifu>u. Following arguments in Serfling (198, Section 7..1), prfj^u fi u j <fflg1asffi, completing the consistency proof. The order O p (ffi 1= )isshown via Taylor series epansion. For the remainder of the proofs, let u be the true value, let prime ( ) denote differentiation with respect to u, and assume all functions S and T and their derivatives S, T, etc. are evaluated at the true u. Let ^u = ^u fi. Lemma 3. Under the conditions of Lemma and mild smoothness conditions on T 1 (u), (^u fi u) =D 1 11 T 1 + Z 1 + Z = D 1 11 T 1 + Z 3 = O p (ffi 1= ); where D 1 11 T 1 does not depend on y, Z 1 = Z 1 () = O p (ffi ), Z = Z (y; ) = O p (ffi 3= ), and Z 3 = Z 3 (y; ) =O p (ffi ). Proof. Recall that D 11 = E( T 1) and define D 11y = E( T 1y) and D 11 = E( T 1). Straightforward inspection provides the following orders of stochastic or fied magnitude as ffi : T 1y = O p (1), T 1 = O p (ffi 1= ), D 11y = 5

27 O(1), D 11 = O(ffi 1 ), T 1y = O p (1), T 1 = O p (ffi 1 ), T 1 = O p (ffi 1 ), T 1 + D 11 = O p (ffi 1= ), T 1 + D 11 = O p (ffi 1= ), T 1 = O p (ffi 1 ), T 1 = O p (ffi 1 ), T 1 E(T 1) = O p (ffi 1= ), T 1y = O p (1), T 1 = O p (ffi 1 ). Also, T 1 (u Λ ) and T 1 (u Λ ) are both O p (ffi 1 ) for u Λ in a neighborhood of u, by smoothness of T 1. Finally, (^u u) = o p (1), by Lemma. The remainder of the proof involves third order Taylor series epansions of T 1 in u, setting T 1 = T 1 + T 1y. Lemma 4. Let the matri a = (a 1 ;a ), where a k is p 1, k = 1;. Then a 1 =~a 1 +O(ffi )=O(ffi ), where ~a 1 = D 1 D And, a =~a +O(ffi 3 )=O(ffi ), where ~a = (D D 11 D 1 D 1 )=(D 3 11). Further, a 1 = ~a 1 + O(ffi ) = O(ffi ), a 1 = O(ffi ), a = O(ffi ), a = O(ffi ), and ~a 1 =(D 1D 11 D 1 D 11)=D 11. Proof. By the epressions in Appendi A, D 1 = O(1), D = O(1), D 11 = O(ffi 1 ), D 1 = O(ffi 1 ), D = D 11 + O p (ffi 1 ) = O p (ffi ). Taylor-series epansions and order-of-magnitude bookkeeping complete the proof. Lemma 5. The u-derivative S of S is S = ~ S + O p (ffi ), where ~S =(S + D 1 ) ~a 1 (T 1 + D 11 ) (~a 1 ~a D 11 )T 1 : (S + D 1 )=O p (1) and the other two terms are O p (ffi 1= ). Proof. Recall that D 1 = E( S ), D 11 = E( T 1) and D 1 = E( T ). In addition to the orders of magnitude in the proof of Lemma 3, S +D 1 = O p (1), T 1 = O p (ffi 1= ), T 1 + D 11 = O p (ffi 1= ), T = O p (ffi 1 ), T + D 1 = O p (ffi 3= ). Write S = S a 1 T 1 a 1T 1 a T a T. By second-order local ancillarity and equation (5), S is unbiased, so we may write S =(S + D 1 ) a 1 (T 1 + D 11 ) a 1(T 1 ) a (T + D 1 ) a (T ); 6

28 for which each term in parentheses is unbiased. Working term-by-term, the first is O p (1). Using Lemma 4, the second is a 1 (T 1 + D 11 ) = ~a 1 (T 1 + D 11 )+ O p (ffi 3= ) = O p (ffi 1= ). Similarly, the third term is a 1T 1 = ~a 1T 1 + O p (ffi 3= ) = O p (ffi 1= ). For the fourth term, use Lemma 6 and equation (5) to write T + D 1 = ft 1 E(T 1 )g ft 1 D 11 g +ft 1 (T 1 + D 11 ) (D 11 D 1 =)g: Thereby show that a (T + D 1 )= ~a T 1 D 11 + O p (ffi )=O p (ffi 1= ). The last term is a T = O p (ffi ), completing the proof. Lemma 6. The unbiased estimating functions (S +D 1 ) and (T 1 +D 11 ) are information unbiased with respect to T 1. That is Ef (S + D 1 ) g = Ef(S + D 1 )T 1 g = D 1 D and Ef (T 1 + D 11 ) g = Ef(T 1 + D 11 )T 1 g = D 11 D 1. Proof. The results are shown through application of equation (5), manipulations of the epressions in Appendi A and the surrogacy assumption. Corollary 7. S T 1 D 1 11 = ~S T 1 D O p (ffi 3= )=O p (ffi 1= ), where E( ~S T 1 )=. So, under uniform integrability, E(S T 1 D 1 11 )=O(ffi 3= ). Proof. T 1 D 1 11 = O p (ffi 1= ), so S T 1 D 1 11 = ~S T 1 D O p (ffi 3= )=O p (ffi 1= ). E( ~S T 1 ) = is shown using the epressions in Lemmas 4, 5 and 6. Lemma 8. The quantity n (^ufi u) D 1 11 T 1 o S is of order O p(ffi ), but its bias under uniform integrability is of order O(ffi 3= ). Proof. By Lemmas 3 and 5, S n (^ufi u) D 1 11 T 1 o = Op (ffi ). By the same lemmas, we may write S n (^ufi u) D 1 11 T 1 o =(S + D 1 )Z 1 ()+O p (ffi 3= )+O p (ffi 3= ): 7

29 The epectation of the first term in this epression is by independence. Lemma 9. The quantity (^u fi u) S is of order O p(ffi ), but its bias under uniform integrability is of order O(ffi 3= ). The quantity (^u fi u) 3 S (u Λ fi) is of order O p (ffi 3= ) and its bias under uniform integrability is of order O(ffi 3= ). Proof. First, S = S a 1T 1 a 1T 1 a 1 T 1 a T a T a T : Then, since S is unbiased, we may replace each term with its centered version, i.e. S E(S ), a 1fT 1 E(T 1)g. Then S E(S )=O p (1), T (k) 1 E(T (k) 1 )= O p (ffi 1= ), and T (k) E(T (k) ) = O p (ffi 3= ), k = 1;. Write S = fs E(S )g + fs S +E(S )g. Order-of-magnitude bookkeeping shows that S S +E(S ) = O p (ffi 1= ). Therefore, (^u fi u) S = O p (ffi ). Applying Lemma 3, write (^u fi u) S = D 11 T1fS E(S )g + O p (ffi 3= ) The epected value of the first term in the foregoing epression is by independence. For the second result, straightforward computations show S = O p (1). Also, (^u fi u) 3 = O p (ffi 3= ); assuming S (u) is sufficiently smooth in u such that S (u Λ )=O p (1), then (^u fi u) 3 S (u Λ )=O p (ffi 3= ), completing the proof. Proof of Theorem 1. Let Z = D 1 11 ~ S T 1. Using epansions (8) and (9), and applying Corollary 7 and Lemmas 8 and 9, the proof is immediate. APPENDIX C: Algorithm for solving ^S = The equation ^S (fi) = can be solved by iterating two steps: (1) For fied ^u i s, take one step in the solution of P i S i (fi; ^u i ) = for fi. () For fied ^fi and for 8

30 each i, take one step in the solution of T Λ i1( ^fi;u i )= for u i. The first step is implemented using a Fisher-scoring algorithm, eploiting the components of the matri D i. The precision matri for fi inferences is the first set of columns D i+ = (D i ;D i1 ;D i ) T, of D i. Defining the matri L i = (I p ; a i ), we have S i = L i (S T i;t i1 ;T i ) T, and hence Ef (@S i =@fi) = L i D i+ = D i. Estimating D i by plugging in ^fi (o) and ^u (o) i ^fi (n) = ^fi (o) + ( X i, ^fi (o) is updated to ^fi (n) with D i ( ^fi (o) )) 1 ^S ( ^fi (o) ): In the second step, we use a Newton-Raphson scheme to update ^u (o) i to ^u (n) i using the observed rather than the epected derivatives of T Λ i1 with respect to u i. We set the maimum number of iterations between (1) and () at 1. APPENDIX D: Proof of Section 4.3 result By standard estimating function theory (^ ) = (@U =@ ) 1 U + o p ( p n). By Taylor-series epansion, ^S (fi; ^ ) = ^S (fi; 1 U + o p ( = ^S (fi; ) CD 1 U + o p ( p n): Since E(U ) = and (@U =@fi) =,it is immediate that Ef (@ ^S (fi; ^ )=@fi)g =Ef (@ ^S (fi; )=@fi)g + o(n), whose limit (scaled by n) is A. Also, Ef ^S (fi; ^ ) ^S (fi; ^ ) T g =E h f ^S (fi; ) CD 1 U gf ^S (fi; ) CD 1 U g T i +o(n); whose limit (scaled by n)isb Λ. The result that p n( ^fi fi Λ ) d Nf;A 1 B Λ (A 1 ) T g then follows from standard estimating function theory. 9

31 References Carroll, R.J., Ruppert, D., and Stefanski, L.A. (1995), Measurement Error in Nonlinear Models, London: Chapman and Hall. Carroll, R.J., and Stefanski, L.A. (199), Approimate Quasi-likelihood Estimation in Models with Surrogate Predictors," Journal of the American Statistical Association, 85, Carroll, R.J., and Wand, M.P. (1991), Semiparametric Estimation in Logistic Measurement Error Models," Journal of the Royal Statistical Society, Ser. B, 53, Cook, J., and Stefanski, L.A. (1995), A simulation etrapolation method for parametric measurement error models," Journal of the American Statistical Association, 89, Crowder, M. (1987), On Linear and Quadratic Estimating Functions," Biometrika, 74, Efron, B., and Tibshirani, R.J. (1993), An Introduction to the Bootstrap, New York: Chapman and Hall. Godambe, V.P. (196), An Optimum Property of Regular Maimum Likelihood Estimation," Annals of Mathematical Statistics, 31, Godambe, V.P., and Thompson, M.E. (1974), Estimating Equations in the Presence of a Nuisance Parameter," Annals of Statistics,,

32 Lindsay, B. (198), Conditional Score Functions: Some Optimality Results," Biometrika, 69, Lindsay, B.G. (1985), Using Empirical Partially Bayes Inference for Increased Efficiency," The Annals of Statistics, 13, McBurnett, K., Lahey, B.B., Rathouz, P.J., and Loeber, R. (). Low Salivary Cortisol and Persistent Aggression in Boys referred for Disruptive Behavior," Archives of General Psychiatry, 57, McCullagh, P. (1983), Quasi-likelihood Functions," Annals of Statistics, 11, McCullagh, P., and Nelder, J.A. (1989), Generalized Linear Models (nd ed.). London: Chapman and Hall. Rathouz, P.J., and Liang, K-Y. (1999), Reducing Sensitivity to Nuisance Parameters in Semiparametric Models: A Quasiscore Method," Biometrika, 86, Small, C.G., and McLeish, D.L. (1989), Projection as a Method for Increasing Sensitivity and Eliminating Nuisance Parameters," Biometrika, 76, Small, C.G., and McLeish, D.L. (1994), Hilbert Space Methods in Probability and Statistical Inference, New York: John Wiley and Sons. Stefanski, L.A. (1985), The Effects of Measurement Error on Parameter Estimation," Biometrika, 7,

33 Stefanski, L.A., and Carroll, R.J. (1987), Conditional Scores and Optimal Scores for Generalized Linear Measurement Error Models," Biometrika, 74, Waterman, R.P., and Lindsay, B.G. (1996a), Projected Score Methods for Approimating Conditional Scores," Biometrika, 83, Waterman, R.P., and Lindsay, B.G. (1996b), A Simple and Accurate Method for Approimate Conditional Inference Applied to Eponential Family Models," Journal of the Royal Statistical Society, Ser. B, 58, Wedderburn, R.W.M. (1974), Quasi Likelihood Functions, Generalized Linear Models and the Gauss-Newton Method," Biometrika, 61,

34 Table 1. Simulation Study of Overdispersed Poisson Model with Intercept and Mismeasured Covariate. 5 Replicates. u i ο normal Model % bias in ^fi 1 % CV ( ^fi 1 ) MSE Cov. ep(fi 1 ) ffi y ffi ^S RC ^S RC ^S Rat. % u i ο uniform NOTE: Full model is μ y = fi + fi 1 u, with fi =. Variable u has standard deviation one; distribution is given in the tet. CV ( ^fi 1 ) is the squared coefficient of variation, relative to the true fi 1. MSE Ratio is the mean squared error of the RC estimator relative to the ^S estimator. Coverage percent is for nominal 95% Wald-type confidence intervals for fi 1 using ^S = and variance estimator in Section

1. Introduction This paper focuses on two applications that are closely related mathematically, matched-pair studies and studies with errors-in-covari

1. Introduction This paper focuses on two applications that are closely related mathematically, matched-pair studies and studies with errors-in-covari Orthogonal Locally Ancillary Estimating Functions for Matched-Pair Studies and Errors-in-Covariates Molin Wang Harvard School of Public Health and Dana-Farber Cancer Institute, Boston, USA and John J.