Covariate Adjusted Correlation Analysis via Varying Coefficient Models

Size: px

Start display at page:

Download "Covariate Adjusted Correlation Analysis via Varying Coefficient Models"

Clyde Bruce
5 years ago
Views:

1 Covariate Adusted Correlation Analysis via Varying Coefficient Models DAMLA ŞENTÜRK Department of Statistics, Pennsylvania State University HANS-GEORG MÜLLER Department of Statistics, University of California, Davis ABSTRACT. We propose covariate adusted correlation (Cadcor) analysis to target the correlation between two hidden variables that are observed after being multiplied by an unnown function of a common observable confounding variable. The distorting effects of this confounding may alter the correlation relation between the hidden variables. Covariate adusted correlation analysis enables consistent estimation of this correlation, by targeting the definition of correlation through the slopes of the regressions of the hidden variables on each other and by establishing a connection to varying-coefficient regression. The asymptotic distribution of the resulting adusted correlation estimate is established. These distribution results, when combined with proposed consistent estimates of the asymptotic variance, lead to the construction of approximate confidence intervals and inference for adusted correlations. We illustrate our approach through an application to the Boston house price data. Finite sample properties of the proposed procedures are investigated through a simulation study. Key words: additive distortion model, asymptotic normality, confidence interval, confounding, linear regression, martingale CLT, multiplicative distortion model, smoothing, varying coefficient model. Running Title: Covariate Adusted Correlation

2 1. Introduction We address the problem of estimating the correlation between two variables which are distorted by the effect of a confounding variable. For identifiability, we assume that the mean distorting effect of the confounder corresponds to no distortion. We consider the multiplicative effects model where the actual variables (Y, X) are observed after being multiplied by a smooth unnown function of the confounder, leading to the observations Ỹ i = ψ(u i )Y i, and Xi = φ(u i )X i. (1) Here (Y i, X i ) represent the unobserved realizations of the actual variables, ψ( ), φ( ) are unnown smooth functions of the confounder with observed values U i, and (Ỹi, X i ) are the distorted observations of the original variables for a sample size of n. The nature of the relationship of the confounding variable U with the underlying variables will often be unnown, implying that ψ( ) and φ( ) in (1) are unnown functions. The identifiability condition of no average distortion can be expressed as E{ψ(U)} = 1, E{φ(U)} = 1. (2) We also assume that (Y i ), (X i ), (U i ) are independent and identically distributed for i = 1,..., n, where U is independent of Y and X. One example is the Boston house price data of Harrison & Rubinfeld (1978), where finding the correlation relation between crime rate and house prices is of interest. However, a confounder affecting both variables is the proportion of population of lower educational status. For such data, model (1) provides a general and sensible way to describe this confounding as we shall demonstrate. The simultaneous dependence of the original variables on the same confounder may lead to artificial correlation relations which do not exist between the actual hidden variables whose correlation we want to infer. To illustrate how drastically the multiplicative distorting effects of the confounder may change the correlation between the underlying variables even under the identifiability condition of no average distortion, consider the following example. The underlying variables Y vs. X, simulated from a bivariate normal distribution with ρ (Y,X) = 0.5, and the distorted versions 1

3 Ỹ vs. X, where the distortion is multiplicative as given in (1) through smooth, almost linear functions of a uniformly distributed confounder, have been plotted in Figure 1. Detailed explanations are given in the simulation study in Section 5.2. For this data, the sample correlation between the underlying variables is ˆρ (Y,X) = , whereas for the distorted variables the sample correlation is negative, ˆρ ( Ỹ, X) = FIGURE 1 ABOUT HERE A central goal of this paper is consistent estimation and inference for ρ (Y,X), the correlation coefficient of the hidden variables (Y, X), given the observations of the confounding variable U i and the distorted observations (Ỹi, X i ) in (1). We refer to model (1), (2) as multiplicative distortion model. Adustment for confounding variables per se is a classical problem. We start by investigating a sequence of nested models, for all of which standard adustment methods to obtain consistent estimates of the correlation already exist. First, consider an additive distortion model, Ỹ = Y + ψ a (U) and X = X + φ a (U), with identifiability constraints E{ψ a (U)} = E{φ a (U)} = 0, for the distorting effects of U to average out to 0. A simple adustment method for the consistent estimation of ρ (Y,X) in the additive distortion model is to use an estimate of ρ (ẽỹ U,ẽ XU ), where ẽ W1 W 2 is the set of errors from the nonparametric regression of W 1 on W 2 (referred to as nonparametric partial correlation). However, as we show in Appendix D, under (1), (2), the estimate of ρ (ẽỹ U,ẽ XU ) is targeting the value ξ 1 = ρ (Y,X), (3) where = E{ψ(U)φ(U)}/[ E{ψ 2 (U)} E{φ 2 (U)}]. Noting that can assume any real value in the interval (0, 1], this simple adustment, while woring for the special case of an additive distortion model, fails for the multiplicative distortion model. The second model considered is a special case of the additive distortion model, where the distorting functions ψ a ( ) and φ a ( ) are linear functions of U. In this case, a consistent estimate of ρ (eỹ U,e XU ) will also be consistent for ρ (Y,X) where e W1 W 2 is the set of errors from the least squares regression of W 1 on W 2. This simple adustment method is also nown as the partial correlation of Ỹ and X, adusted for U (Pearson, 1896). This popular adustment however fails for the multiplicative distortion model, since under (1), (2), the 2

4 target value ξ 2 of the estimate of ρ (eỹ U,e XU ) will generally not satisfy ξ 2 = ρ (Y,X). Indeed it holds that ξ 2 = ξ 1, where ξ 1 is as given in (3) (see Appendix E). This adustment method therefore is fraught with the same bias problem as the nonparametric partial correlation adustment discussed above. What if we ignore the distorting effects of the confounder U on (Ỹ, X)? In this case we would simply use the regular correlation ρ ( Ỹ, X) in order to target ρ (Y,X). As we will show in Appendix F, under (1), (2), this correlation estimate is targeting the value ξ 3 = E{φ(U)ψ(U)}E(XY ) E(X)E(Y ) var{φ(u)}e(x2 ) + var(x) var{ψ(u)}e(y 2 ) + var(y ). (4) Now if ψ( ) φ( ) 1, i.e., there is no confounding, then we immediately see that ρ ( Ỹ, X) = ρ (Y,X), so that in this case of no confounding this estimate is on target. However, if ψ( ) and φ( ) do not equal one, then we find that ξ 3 can assume any real value in the interval [ 1, 1]. Therefore, arbitrarily large biases can result if one estimates a correlation while ignoring the confounding. Another straightforward approach would be to apply logarithmic transformations to Ỹ and X, so as to change the effect of the distorting functions ψ( ) and φ( ) from multiplicative to additive. We would then use the nonparametric partial correlation adustment method to estimate ρ {log( Ỹ ),log( X)} consistently. However, this ad hoc solution might fall short since the observed Ỹ and X might not necessarily be positive. Furthermore, one may be interested in the correlation between the untransformed variables which is not easy to recover from that of the transformed variables. Since the available adustment methods fail to adust properly for the distorting effects of the confounder U in the multiplicative distortion model, a new adustment method for correlations needs to be developed, a problem that we address in this paper. Our starting point is the equality ρ (Y,X) = sign(γ 1 ) γ 1 η 1 = sign(η 1 ) γ 1 η 1, (5) where γ 1 and η 1 are the slopes from the linear regressions of Y on X and X on Y respectively, and ρ (Y,X) is the underlying targeted correlation. We then propose an estimate of ρ (Y,X), based on pilot estimates of γ 1 and η 1. Assuming that the linear models given in (6) 3

5 below hold between the underlying variables Y and X, the proposed general estimation method provides a consistent estimate for ρ (Y,X) not only under a multiplicative distortion model, but under all three distortion models outlined above (Şentür & Müller, 2005a). This is one of the maor attractions of the proposed adustment since in most applications the specific nature of the distortion will be unnown. The asymptotic distribution of the resulting covariate adusted correlation (Cadcor) estimates is established. This main result, combined with proposed consistent estimates for the asymptotic variance, is then applied for the construction of approximate confidence intervals for the correlation coefficient. The paper is organized as follows. In Section 2 we describe the model in more detail and explore its relationship with varying coefficient models. In Section 3 we introduce the covariate adusted correlation (Cadcor) estimates and present results on asymptotic inference. Consistent estimates for the asymptotic variance, as needed for the implementation of inference procedures, are derived in Section 4. Applications of the proposed method to the Boston housing data are discussed in Section 5, where we also present some simulation results. The proofs of the main results are assembled in Section 6, followed by an Appendix with technical conditions and auxiliary results. 2. Covariate adustment via varying coefficient regression We assume the following regression relations between the unobserved variables Y and X: Y i = γ 0 + γ 1 X i + e Y X,i, X i = η 0 + η 1 Y i + e XY,i, (6) where e Y X,i is the error term such that E(e Y X,i ) = 0 with constant variance σy 2 X, and e XY,i with E(e XY,i ) = 0 and constant variance σxy 2. Our goal is to use estimates of γ 1 and η 1 (Şentür & Müller, 2005a) to arrive at an estimate for ρ (Y,X) via (5). The regression for observed variables leads to E(Ỹi X i, U i ) = E{Y i ψ(u i ) φ(u i )X i, U i } = ψ(u i )E{γ 0 + γ 1 X i + e Y X,i φ(u i )X i, U i }. 4

6 Assuming that E(e Y X,i ) = 0 and that (e Y X, U, X) are mutually independent, the model reduces to E(Ỹi X i, U i ) = ψ(u i )γ 0 + ψ(u i )γ 1 φ(u i )X i φ(u i ) = β 0 (U i ) + β 1 (U i ) X i, defining the functions β 0 (u) = γ 0 ψ(u), β 1 (u) = γ 1 ψ(u) φ(u). (7) This leads to Ỹ i = β 0 (U i ) + β 1 (U i ) X i + ψ(u i )e Y X,i, corresponding to a multiple varying coefficient model, i.e. an extension of regression and generalized regression models where the coefficients are allowed to vary as a smooth function of a third variable (Hastie & Tibshirani, 1993). For varying coefficient models, Hoover et al. (1998) have proposed smoothing methods based on local least squares and smoothing splines, and recent approaches include a componentwise ernel method (Wu & Chiang, 2000), a componentwise spline method (Chiang et al., 2001) and a method based on local maximum lielihood estimates (Cai et al., 2000). We derive asymptotic distributions for an estimation method that is tailored to this special model. Since the assumptions on Ỹ and X are symmetric, with a similar argument as above, regressing X on Ỹ and U leads to a second varying coefficient model X i = α 0 (U i ) + α 1 (U i )Ỹi + φ(u i )e XY,i, where α 0 (u) = η 0 φ(u), α 1 (u) = η 1 φ(u) ψ(u). (8) 3. Estimation of covariate adusted correlation and asymptotic distribution The proposed Cadcor estimate for ρ (Y,X) is r = sign(ˆγ 1 ) ˆγ 1ˆη 1 1 {sign(ˆγ 1 )=sign(ˆη 1 )}, (9) 5

7 where the estimates of ˆγ 1 and ˆη 1 are obtained after an initial binning step and 1 {sign(ˆγ 1 )=sign(ˆη 1 )} denotes the indicator for sign(ˆγ 1) = sign(ˆη 1 ). We assume that the covariate U is bounded below and above, < a U b < for real numbers a < b, and divide the interval [a, b] into m equidistant intervals denoted by B 1,..., B m, referred to as bins. Given m, the B, = 1,..., m are fixed, but the number of U i s falling into B is random and is denoted by L. For every U i falling in the th bin, i.e., U i B, the corresponding observed variables are X i and Ỹi. After binning the data, we fit linear regressions of Ỹ i on X i and X i on Ỹi, using the data falling within each bin B, = 1,..., m. The least squares estimates of the resulting linear regressions for the data in the th bin are denoted by ˆβ T = ( ˆβ 0, ˆβ 1 ) T and ˆα T = (ˆα 0, ˆα 1 ) T, corresponding to the linear regressions of Ỹi on X i and of Xi on Ỹ i. The estimators of γ 0, γ 1, η 0 and η 1 are then obtained as weighted averages of the ˆβ s and ˆα s, weighted according to the number of data L in the th bin, ˆγ 0 = m =1 L n ˆβ 0 ˆγ 1 = 1 X m =1 L n ˆβ 1 X, (10) and ˆη 0 = m =1 L n ˆα 0 ˆη 1 = 1 Ỹ m =1 L n ˆα 1 Ỹ, (11) where X = n 1 n X i=1 i, Ỹ = n 1 n i=1 Ỹi and X, Ỹ are the averages of the X i and Ỹi falling into the bin B respectively, i.e. X = L 1 n X i=1 i 1 {Ui B } and Ỹ = n i=1 Ỹi1 {Ui B }. (Şentür & Müller, 2005a). These estimates are motivated by L 1 E{β 0 (U)} = γ 0, E{β 1 (U) X} = γ 1 E( X), E{α 0 (U)} = η 0 and E{α 1 (U)Ỹ } = η 1E(Ỹ ) (see (7), (8) and the identifiability conditions). Remar 1: An alternative to the equidistant binning adopted in this paper would be nearest neighbor binning, where the number of points falling in each bin is fixed, but the bin boundaries are random. We have chosen to adopt the equidistant binning where bin boundaries are fixed but the number of points falling in each bin is random, as we found it to wor quite well in applications. One might expect equidistant binning to be superior in terms of controlling the bias, whereas the nearest neighbor approach might yield smaller variance, depending on the underlying assumptions. This trade-off could be 6

8 further investigated in future research. Remar 2: Yet another alternative to the proposed estimation procedure targeting the regression coefficients γ 0, γ 1, η 0 and η 1 would be to use one of the above cited smoothing techniques, such as ernel smoothing based on local least squares or smoothing splines, to estimate the varying coefficient functions evaluated at the original observation points U i, i = 1,..., n, and then to apply an averaging method to estimate the targeted regression coefficients. In the special case of using local polynomial fitting based on local least squares, Zhang et al. (2002) were able to derive the asymptotic conditional mean squared error of such an estimator. However, asymptotic distributional results as we provide in this paper are yet to be established for this class of varying coefficient estimators. Remar 3: A possible extension of the proposed confounding setting would be to consider a vector valued confounding variable U. The proposed binning and smoothing techniques could be easily adapted to two or three dimensional cases, but one will encounter the problem of curse of dimensionality for higher dimensions, as the bins will get sparse quicly as dimension increases. To overcome this problem, a topic for future research would be to adopt a single index approach, where the confounding effects are functions of a linear combination of the confounding vector U, say ν T U, where the single index vector ν needs to be estimated. We derive the asymptotic distribution of r, the Cadcor estimate (9), as the number of subects n tends to infinity. As in typical smoothing applications, the number of bins m = m(n) is required to satisfy m and n/(m log n) and m/ n as n. We denote convergence in distribution by D and convergence in probability by p. Theorem 1. Under the technical conditions (C1) (C6) given in Appendix A, on events E n (defined in (16)-(18) below) with P (E n ) 1 as n, it holds that n(r ρ(y,x) ) D N(0, σ 2 r), for ρ (Y,X) 0, where the explicit form of σ 2 r is given in Appendix B. Remar 4: It is of interest to test the hypothesis H 0 : ρ (Y,X) = 0, since this case is 7

9 excluded by the assumptions of Theorem 1. Equation (5) implies that testing the hypothesis H 0 : ρ (Y,X) = 0 is equivalent to testing H 0 : γ 1 = 0 or testing H 0 : η 1 = 0. Thus, we propose to test the hypothesis H 0 : ρ (Y,X) = 0 via testing H 0 : γ 1 = 0. For this testing problem, the bootstrap test proposed in Şentür & Müller (2005a) is available. This test has been shown to attain desirable coverage levels in simulation studies. 4. Estimating the asymptotic variance From this point on, we will use subscripts n to denote variables that form a triangular array scheme. The observable data are of the form (U ni, X ni, Ỹni), i = 1,..., n, for a sample of size n. Correspondingly, the underlying unobservable variables and errors are (X ni, Y ni, e Y X,ni, e XY,ni ), i = 1,..., n. Let {(U n, X n, Ỹ n, X n, Y n, e Y X,n, e XY,n ), = 1,..., L n, r = 1,..., p} = {(U ni, X ni, Ỹni, X ni, Y ni, e Y X,ni, e XY,ni ), i = 1,..., n, r = 1,..., p : U ni B n } denote the data for which U ni B n, where we refer to (U n, X n, Ỹ n, X n, Y n, e Y X,n, e XY,n ) as the th element in bin B n. Then we can express the least squares estimates of the linear regressions of the observable data falling in the th bin B n as ˆβ n1 = Ln =1 ( X n X n )Ỹ n Ln =1 ( X n, ˆβn0 = Ỹ X n ) 2 n ˆβ n1 X n, (12) for the regression of Ỹ on X leading to the parameter estimates ˆγ n0 and ˆγ nr given in (10), and ˆα n1 = Ln =1 (Ỹ n Ỹ n) X n Ln =1 (Ỹ n Ỹ, ˆα n0 = X n ˆα n1 Ỹ n )2 n, (13) for the regression of X on Ỹ leading to the parameter estimates ˆη n0 and ˆη nr given in (11) respectively. In the above expression, X n = L 1 Ln X n =1 n and Ỹ n = L 1 Ln n =1 Ỹ n. Next we introduce least squares estimates of the linear regressions of the unobservable data falling into B n, i.e., ζ n1 = Ln =1 (X n X n)y n Ln =1 (X n X, ζ n0 = Ȳ n )2 n ζ n1 X n, (14) 8

10 for the regression of Y on X, and ω n1 = Ln =1 (Y n Ȳ n)x n Ln =1 (Y n Ȳ, ω n0 = X n )2 n ω n1 Ȳ n, (15) for the regression of X on Y. These quantities are not estimable, but will be used in the proof of the main results. For ˆγ n0, ˆγ n1, ˆη n0 and ˆη n1 given in (10) and (11) to be well defined, the least squares estimates ˆβ n0, ˆβn1, ˆα n0 and ˆα n1 given in (12), (13) must exist for each bin B n. = L 1 n 2 X n (L 1 n X 2 n )2 = L 1 n Ỹ n 2 This requires that s 2 X and s 2 n Ỹ n (L 1 n Ỹ n 2 )2 are strictly positive for each B n. Correspondingly, ζ n0, ζ n1, ω n0 and ω n1 given in (14), (15) will exist under the condition that s 2 X, s 2 n Y n s 2 Y n are defined similar to s 2 X n and s 2 Ỹ n. Define the events Ã n = {ω Ω : inf A n = {ω Ω : inf C n = {ω Ω : inf C n = {ω Ω : inf s 2 X n s 2 X n s 2 Ỹ n s 2 Y n > κ, and min L n > 1}, > 0, where s 2 X n and > κ, and min L n > 1}, (16) > κ, and min L n > 1}, > κ, and min L n > 1}, (17) E n = Ãn A n C n C n, (18) where κ = min{ϱ x /2, inf (φ 2 (U n))ϱ x /2, ϱ y /2, inf (ψ 2 (U n))ϱ y /2}, ϱ x and ϱ y are as defined in (C5), U n = L 1 Ln n =1 U n is the average of the U s in B n, and (Ω, F, P ) is the underlying probability space. The estimates ˆγ n0, ˆγ n1, ζ n0, ζ n1, ˆη n0, ˆη n1, ω n0 and ω n1 are well defined on events Ãn, A n, Cn and C n. Generalizing a result in Appendix A.3 of Şentür & Müller (2003b), we have that P (E n ) 1 as n, due to the symmetry between Ỹ and X. Theorem 2. Under the technical conditions (C1) (C6) given in Appendix A, on event E n (defined in (16)-(18)) with P (E n ) 1 as n, it holds that ˆσ 2 nr p σ 2 r, where the explicit form of ˆσ 2 nr is given in Appendix B. 9

11 5. Applications and Monte Carlo study Under the technical conditions given in Appendix A, if ρ (Y,X) 0, the asymptotic distribution of the Cadcor estimate r according to Theorem 1 is n(r ρ (Y,X) )/σ r D N(0, 1), as n, where σ 2 r is as in Appendix B. Using the consistent estimate ˆσ 2 nr of σ 2 r proposed in Theorem 2, it follows from Slutsy s theorem that n(r ρ (Y,X) )/ˆσ nr D N(0, 1), so that an approximate (1 ϕ) asymptotic confidence interval for ρ (Y,X) has the endpoints r ± z ϕ/2 ˆσ nr n. (19) Here z ϕ/2 is the (1 ϕ/2)th quantile of the standard Gaussian distribution Application to the Boston house price data We analyze a subset of the Boston house price data (available at of Harrison & Rubinfeld (1978). These data include the following variables: proportion of population of lower educational status (i.e. proportion of adults without high school education and proportion of male worers classified as laborers) (%LS), per capita crime rate by town (crime rate, CR) and median value of owner-occupied homes in $1000 s (house price, HP) for 506 towns around Boston. A goal is to identify the factors affecting the house prices in Boston. However, it is hard to separate the effects of different factors on HP since there are confounding variables present. Of interest is the correlation between HP and CR. While we can simply compute the standard correlation between these variables, a more meaningful question is whether these variables are still correlated after adusting for %LS, since we may reasonably expect %LS to influence the relationship between HP and CR. We therefore estimate the Cadcor using %LS as the confounding variable U. The correlation ρ (Y,X) was estimated by the Cadcor method and the results were compared to estimators obtained without adustment and with the standard correlation adustment methods that we have discussed above in Section 1, namely using estimates of ρ (ẽỹ U,ẽ XU ) and ρ (eỹ U,e XU ). The estimators and approximate 95% asymptotic confidence intervals for ρ (Y,X) for these four methods are displayed in Table 1. For the two standard adustment methods and the case of no adustment, approximate confidence intervals 10

12 were obtained using Fisher s z-transformation. Before forming confidence intervals for the proposed covariate adusted correlation (5), its significance was tested (see Remar 4 after Theorem 1). It was found to be significant at the 5% level (p-value= 0.048). The approximate confidence interval for the Cadcor (19) was obtained using the variance estimate ˆσ nr 2 (see Theorem 2). The scatter-plots of the raw estimates ( ˆβ nr1,..., ˆβ nrm ) (12) and (ˆα nr1,..., ˆα nrm ) (13) vs. the midpoints of the bins (B n1,..., B nm ) are shown in Figure 2 for intercepts (r = 0) and slopes (r = 1). The scatter-plots of the raw correlations within each bin (ˆr n1,..., ˆr nm ) vs. the midpoints of the bins (B n1,..., B nm ), where ˆr n is defined as ˆr n = sign( ˆβ n1 ) ˆβn1 ˆα n1 1 {sign( ˆβ n1 )=sign(ˆα n1 )}, (20) are shown in Figure 3, along with scatter-plots of the variables (%LS, HP, CR). FIGURE 2 ABOUT HERE The implementation of the binning includes the merging of sparsely populated bins. Bin widths were chosen aiming at a number of at least three points in each bin, and bins with less than three points were merged with neighboring bins. For this example with n = 506, the average number of points per bin was 18, yielding a total of 24 bins after merging. TABLE 1 ABOUT HERE The unadusted correlation ρ ( Ỹ, X) for HP and CR was found to be When adusted for %LS with Cadcor, the adusted correlation was found to be This implies that the proportion of population of lower educational status explains a significant amount of the negative correlation between median house price and crime rate. estimate of nonparametric partial correlation ρ (ẽỹ U,ẽ XU ) was , which came closest to the Cadcor estimate, even though the approximate 95% confidence interval based on this estimate did not contain the Cadcor of When adusting for %LS with partial correlation (ˆρ (eỹ U,e XU ) = ), HP and CR were found to be no longer correlated. The reason for the relative poor performance of the partial correlation estimate might be its inability to reflect the nonlinear nature of the relationship between HP and %LS, as demonstrated in Figure 3. FIGURE 3 ABOUT HERE 11 The

13 From Figure 3 bottom right panel, the correlation between HP and CR is seen to be positive when %LS is between 0 and 10, and to become negative for larger %LS values. The same phenomenon can also be observed from the positive slopes of HP and CR in Figure 2, top and bottom right panels. This positive correlation between house prices and crime rate for relatively high status neighborhoods seems counter-intuitive at first. However, the reason for this positive correlation is the existence of high educational status neighborhoods close to central Boston where high house prices and crime rate occur simultaneously. The expected effect of increasing crime rate on declining house prices seems to be only observed for lower educational status neighborhoods in Boston Monte Carlo simulation The confounding covariate U was simulated from Uniform(1, 7). The underlying unobserved variables (X, Y ) T were simulated from a bivariate normal distribution with mean vector (2, 3) T, σ 2 X = 0.3, σ2 Y = 0.4 and ρ (X,Y ) = 0.5. The distorting functions were chosen as ψ(u) = (U + 1)/5 and φ(u) = ( 2U 2 + 4U + 90)/68, satisfying the identifiability conditions. We conducted 1000 Monte Carlo runs with sample sizes 100, 400 and For each run, 95% asymptotic confidence intervals were constructed by plugging estimates ˆσ 2 nr, r = 0,..., p, given in Theorem 2, into (19). The estimated non-coverage rates in percent and mean interval lengths for these confidence intervals were (8.06, 5.50, 5.30) and (0.44, 0.16, 0.08) respectively for sample sizes n = (100, 400, 1600). The estimated non-coverage level is getting very close to the target value 0.05, as the sample size increases, and the estimated interval lengths are decreasing. For the sample size of 400, biases for non-adustment (ρ ( Ỹ, X) ) and the two standard correlation adustment methods, nonparametric partial correlation (ρ (ẽỹ U,ẽ XU )) and partial correlation (ρ (eỹ U,e XU )), were also estimated, and found to be , and respectively, for the three methods. The bias of the Cadcor algorithm was , and thus negligible in comparison to the other methods. We have also carried out simulations to study the effects of different choices of m, the total number of bins, on the mean square error of the Cadcor estimates. Under the rate conditions on m given in Section 3, the estimates are found to be quite robust regarding 12

14 different choices of m; we advocate a choice such that all or the vast maority of bins include at least three observations. If there are a few bins that have less than this minimum number, they will be merged with neighboring bins in a second bin-merging step. The average number of points per bin were 5, 16 and 32 for sample sizes n = 100, 400 and 1600, respectively. 6. Proofs of the main results Defining δ n0 = ψ(u n ) ψ(u n) and δ n1 = φ(u n ) φ(u n) for 1 L, where U n = L 1 Ln n =1 U n is the average of the U s in B n, we obtain the following results, by Taylor expansions and boundedness considerations: (a.) sup, U n U n (b a)/m; (b.) sup, δ nr = O(m 1 ), for r = 0, 1. Proof of Theorem 1. We show n m L n =1 m =1 m =1 m =1 ˆβ n n1 X n γ 1 E(X) L n X n n E(X) L n ˆα Ỹ n n1 L n n n η 1 E(Y ) Ỹ n E(Y ) D N 4 (0, Σ), (21) where Σ = {(σ i )} is a 4 4 matrix with elements defined in Theorem 1. The asymptotic normality of n( ˆγ n1ˆη n1 ρ (Y,X) ) will follow from this with a simple application of the δ-method when γ 1 and η 1 or equivalently ρ (Y,X) are different from zero, using the transformation g(x 1, x 2, x 3, x 4 ) T = (x 1 x 3 )/(x 2 x 4 ). The asymptotic normality of n(rn ρ (Y,X) ) and thus Theorem 1 will then follow by Slutsy s theorem, since the consistency of sign(ˆγ n1 ) and 1 {sign(ˆγ 1 )=sign(ˆη 1 )} for sign(ρ (Y,X)) and 1 respectively follow from the consistency of ˆγ n1 and ˆη n1 for γ 1 and η 1 respectively, when γ 1 and η 1 are different from zero, shown in Şentür & Müller (2005a). By the Cramer-Wald device it is enough to show the asymptotic normality of [ { m n a { m +c =1 =1 } { m L n n 1 ˆβn1 X n γ 1 E(X) + b L n n 1 ˆα n1 Ỹ n η 1 E(Y ) } { m + d =1 =1 } L n n 1 X n E(X) 1 L n n Ỹ n E(Y ) }] (22) 13

15 for real a, b, c, d, and (21) will follow. Let S nt = t Z ni = i=1 t [ a γ 1 {ψ(u ni )X ni E(X)} + a X n n n(i) ψ(u ni ) (X ni X n(i) ) i=1, + b n {φ(u ni )X ni E(X)} + c η 1 n {φ(u ni )Y ni E(Y )} + c Ȳ n n(i)φ(u ni ) (Y ni Ȳ s 2 Y n(i) n(i) ) e XY,ni + d ] {ψ(u ni )Y ni E(Y )}, n s 2 X n(i) e Y X,ni where X n(i) = Ln(i) L 1 n =1 X n(i) and s2 X n(i) = L 1 Ln(i) n =1 X n(i) 2 2 X n(i) represents the sample mean and variance of the X s falling in the same bin as X ni ; and Ȳ n(i) and s2 Y n(i) are defined similarly. It is shown in Şentür & Müller (2005b) (equation (17)) that ˆβ sup n0 ψ(u n)ζ n0 ˆβ n1 {ψ(u n)/φ(u n)}ζ n1 = O p(m 1 )1 2 1, (23) where ζ n0 and ζ n1 are as defined in (14). This result implies that ˆα sup n0 φ(u n)ω n0 ˆα n1 {φ(u n)/ψ(u n)}ω n1 = O p(m 1 )1 2 1, (24) where ω n0 and ω n1 are as defined in (15) and is a 2 1 vector of ones, since all assumptions for Ỹ and X are symmetric under the correlation setting. It also follows from Lemma 4 (a,b) of Şentür & Müller (2005b) that sup X n E(X) = o p (1), sup s 2 X n var(x) = o p (1), sup ē Y X,n = o p(1), and sup L 1 n X n e Y X,n = o p(1). This result can analogously be extended to sup Ȳ n E(Y ) = o p (1), sup s 2 Y var(y ) = o n p (1), sup ē XY,n = o p(1), and sup L 1 n Y n e XY,n = o p(1). It follows from (23), (24), Lemma 4 (a,b) of Şentür & Müller (2005b) and property (b) that the expression in (22) equals S nn + O p (m 1 n). Since O p (m 1 n) is asymptotically negligible, the expression in (22) is asymptotically equivalent to S nn. Let F nt be the σ-field generated by {e Y X,n1,..., e Y X,nt, e XY,n1,..., e XY,nt, U n1,..., U nt, L n(1),..., L n(t), X n(1),..., X n(t), Y n(1),..., Y n(t) }. Then it is easy to chec that {S nt = t i=1 Z ni, F nt, 1 t n} is a mean zero martingale for n 1. Since the σ-fields 14

16 are nested, that is, F nt F n,t+1 for all t n, using Lemma 1 given in Appendix C, S D nn N(0, (a, b, c, d)σ(a, b, c, d) T = σz 2 ) in distribution (McLeish, 1974, Theorem 2.3 and subsequent discussion], and Theorem 1 follows. Proof of Theorem 2. The consistency of ˆσ n11, ˆσ n12 and ˆσ n22 for σ 11, σ 12 and σ 22 are given in Theorem 2 of Şentür & Müller (2005b). The consistency of ˆσ n33, ˆσ n34 and ˆσ n44 for σ 33, σ 34 and σ 44 follows from this result, since the assumptions regarding Ỹ and X are symmetric. to The symmetry of Ỹ and X allows to extend equation (22) of Şentür & Müller (2005b) sup ˆα n0 φ(u n)η 0 ˆα n1 {φ(u n)/ψ(u n)}η 1 = o p(1) (25) By Lemma 4 (a,b) and (22) of Şentür & Müller (2005b), property (b.), Law of Large Numbers and boundedness considerations, ˆσ n14 = n ( 1 ψ 2 (U n) X ns 2 X n + n 1 γ 1 ψ 2 (U n) X ny ne Y X,n X n ) Y ne Y X,n X ny n γ 1 E(X)E(Y ) + o p (1) = σ 14 + o p (1). With similar arguments, using Lemma 4 (a,b) of Şentür & Müller (2005b), (25), property (b.), Law of Large Numbers and boundedness considerations, it can be shown that ˆσ n23 = σ 23 + o p (1). It also holds by the Law of Large Numbers that ˆσ n24 = σ 24 + o p (1). Using this result, Lemma 4 (a,b) and (22) of Şentür & Müller (2005b), (25), property (b.), Law of Large Numbers and boundedness considerations, it holds that 1 X ( ˆσ n13 = η 1 ψ(u n n)φ(u n) n X ny ne Y X,n X ) n Y ne Y X,n + γ 1 1 n + 1 n 1 n ψ(u n)φ(u ψ(u n)φ(u n) ψ(u n)φ(u Ȳ n n) s 2 Y n X nȳ n n) s 2 X s 2 n Y n X nȳ 2 n s 2 X s 2 n Y n s 2 X n ( ( X ny ne XY,n Ȳ n X ne XY,n X ny ne Y X,ne XY,n X n ( X ne Y X,ne XY,n X n 15 ) ) Y ne Y X,ne XY,n e Y X,ne XY,n )

17 +γ 1 η 1 cov( X, Ỹ ) + o p(1) = σ 13 + o p (1). Acnowledgements The authors wish to express their gratitude to the two referees, the associate editor and the editor, whose remars greatly improved the paper. This research was supported in part by NSF grants. References Cai, Z., Fan, J. & Li, R. (2000). Efficient estimation and inferences for varying-coefficient models. J. Amer. Statist. Assoc. 95, Chiang, C., Rice, J. A. & Wu, C. O. (2001). Smoothing spline estimation for varying coefficient models with repeatedly measured dependent variables. J. Amer. Statist. Assoc. 96, Harrison, D. & Rubinfeld, D. L. (1978). Hedonic housing prices and demand for clean air. Journal of Environmental Economics & Management 5, Hastie, T. & Tibshirani, R. (1993). Varying-coefficient models. J. Roy. Statist. Soc. B 55, Hoover, D. R., Rice, J. A., Wu, C. O. & Yang, L.-P. (1998). Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometria 85, Lai, T. L., Robbins, H. & Wei, C. Z. (1979). Strong consistency of least-squares estimates in multiple regression 2. J. Multivariate Anal. 9, McLeish, D. L. (1974). Dependent central limit theorems and invariance principles. Ann. Probab. 2, Pearson, K. (1896). Mathematical contributions to the theory of evolution III. Regression, heredity and panmixia. Karl Pearson s early statistical papers. Cambridge University Press (1956), London Şentür, D. & Müller, H. G. (2005a). Covariate adusted regression. Biometria, in press. Şentür, D. & Müller, H. G. (2005b). Inference for covariate adusted regression via varying coefficient models. Technical Report. Wu, C. O. & Chiang, C. T. (2000). Kernel smoothing in varying coefficient models with 16

18 longitudinal dependent variable. Statistica Sinica 10, Zhang, W., Lee, S.-Y. & Song, X. (2002). Local polynomial fitting in semivarying coefficient model. J. Multivariate Anal. 82, Damla Şentür, Department of Statistics, Pennsylvania State University, University Par, PA 16802, USA, Appendix Appendix A: Technical conditions We introduce some technical conditions: (C1) The covariate U is bounded below and above, < a U b < for real numbers a < b. The density f(u) of U satisfies inf a u b f(u) > c 1 > 0, sup a u b f(u) < c 2 < for real c 1, c 2, and is uniformly Lipschitz continuous, i.e., there exists a real number M such that sup a u b f(u + c) f(u) M c for any real number c. (C2) The variables (e Y X, U, X) and (e XY, U, Y ) are mutually independent, where e Y X, e XY are as in (6). (C3) For both variables X and Y, sup 1 i n X ni B 1, sup 1 i n Y ni B 2 for some bounds B 1, B 2 R; and E(X) 0, E(Y ) 0. (C4) Contamination functions ψ( ) and φ( ) are twice continuously differentiable, satisfying Eψ(U) = 1, Eφ(U) = 1, φ( ) > 0, ψ( ) > 0. (C5) Variances of X and Y, σx 2 and σ2 Y are strictly positive, i.e. σ2 X > ϱ x > 0, σy 2 > ϱ y > 0. (C6) The functions h 1 (u) = xg 1 (x, u)dx, h 2 (u) = xg 2 (x, u)dx, h 3 (u) = yg 3 (y, u)dy and h 4 (u) = yg 4 (y, u)dy are uniformly Lipschitz continuous, where g 1 (, ), g 2 (, ), g 3 (, ) and g 4 (, ) are the oint density functions of (X, U), (Xe Y X, U), (Y, U) and (Y e XY, U), respectively. Under these assumptions, the regressions of Ỹ on X and of X on Ỹ both satisfy the conditions given in Şentür & Müller (2005b). Bounded covariates are standard in asymptotic theory for least squares regression, as are conditions (C2) and (C5) (see Lai 17

19 et al., 1979). The identifiability conditions stated in (C4) are equivalent to E(Ỹ X) = E(Y X), E( X Y ) = E(X Y ). This means that the confounding of Y and X by U does not change the mean regression functions. Conditions (C1) and (C6) are needed for the proof of Lemma 4 of Şentür & Müller (2005b). Appendix B: Explicit forms for the asymptotic variance of r and its estimate The asymptotic variance of r is defined as follows σr 2 = σ 11 η 1 σ 12η 1 4{E(X)} 2 γ 1 2{E(X)} + σ 22γ 1 η 1 2 4{E(X)} + σ 13 σ 14 η 1 σ 23 γ 1 + σ 24 η 1 γ 1 2 2E(X)E(Y ) + σ 33 γ 1 σ 34γ 1 4{E(Y )} 2 η 1 2{E(Y )} + σ 44η 1 γ 1 2 4{E(Y )}, 2 where σ 11 = γ1[e{ψ 2 2 (U)}E(X 2 ) {E(X)} 2 ] + σy 2 {E(X)} 2 E{ψ 2 (U)} X, var(x) σ 12 = γ 1 [E{ψ(U)ψ(U)}E(X 2 ) {E(X)} 2 ], σ 22 = var( X), σ 13 = γ 1 η 1 cov( X, Ỹ ) + E(X)E(Y ) var(x)var(y ) E{ψ(U)φ(U)}cov(X, Y e Y Xe XY ) E(Y ) + γ 1 var(y ) E{ψ(U)φ(U)}cov(Y, Xe E(X) XY ) + η 1 var(x) E{ψ(U)φ(U)}cov(X, Y e Y X) E(X){E(Y )}2 var(x)var(y ) E{ψ(U)φ(U)}cov(X, e Y Xe XY ), σ 14 = γ 1 [E{ψ 2 (U)}E(XY ) E(X)E(Y )] + E(X) var(x) E{ψ2 (U)}cov(X, Y e Y X ), σ 23 = η 1 [E{φ 2 (U)}E(XY ) E(X)E(Y )] + E(Y ) var(y ) E{φ2 (U)}cov(Y, Xe XY ), σ 24 = cov( X, Ỹ ), σ 33 = η1[e{φ 2 2 (U)}E(Y 2 ) {E(Y )} 2 ] + σxy 2 {E(Y )} 2 E{φ 2 (U)}, var(y ) σ 34 = η 1 [E{φ(U)ψ(U)}E(Y 2 ) {E(Y )} 2 ], σ 44 = var(ỹ ). The consistent estimate of σ 2 r is ˆσ 2 nr = ˆσ n11ˆη n1 4 X2 nˆγ n1 ˆσ n12ˆη n1 2 X2 n + ˆσ n22ˆγ n1ˆη n1 4 X2 n + ˆσ n13 ˆσ n14ˆη n1 ˆσ n23ˆγ n1 + ˆσ n24ˆη n1ˆγ n1 2 Xn Ỹ n + ˆσ n33ˆγ n1 4 Ỹ ˆσ n34ˆγ n1 n 2 ˆη n1 2 Ỹ n 2 + ˆσ n44ˆη n1ˆγ n1 4 Ỹ, n 2 18

20 where ˆσ n11 = 1 n ˆσ n12 = n 1 ˆβ 2 n1 ˆβ n1 ( X n 2 ˆγ n1 2 X n + n 1 X 2 n ˆγ n1 X2 n, ˆẽ 2 Y X,n )( n 1 ) L X2 n n s 2, X n ˆσ n22 = s 2 X, ˆσ n13 = ˆγ n1ˆη n1 cov( ˆ X, Ỹ ) + n 1 L n X n Ỹ ns 2 s 2 cov( ˆ X X n Ỹ n, Ỹ n nẽ Y X,nẽ XY,n) + n 1 + n 1 L n ˆβn1 Ỹ ns 2 Ỹ n L n ˆα n1 X n s 2 cov(ỹ ˆ X n n, X nẽ XY,n) cov( ˆ X n, Ỹ nẽ Y X,n) n 1 ˆσ n14 = n 1 ˆσ n23 = n 1 ˆσ n24 = cov( ˆ X, Ỹ ), ˆσ n33 = n 1 ˆσ n34 = n 1 ˆσ n44 = s 2 Ỹ, L n X n Ỹ ns 2 2 s 2 X n ˆβ n1 ˆα n1 ˆα 2 n1 ˆα n1 Ỹ n cov( ˆ X n, ẽ Y X,nẽ XY,n), X nỹ n ˆγ n1 Xn Ỹ n + n 1 X nỹ n ˆη n1 Xn Ỹ n + n 1 ( Ỹ n 2 ˆη n1 2 Ỹ n + n 1 Ỹ 2 n ˆη n1 Ỹ 2 n, L n X n s 2 X n L n Ỹ ns 2 ˆẽ 2 XY,n Ỹ n cov( ˆ X n, Ỹ cov(ỹ ˆ )( n 1 ˆẽ Y X,n = Ỹ n ˆβ n0 ˆβ n1 X n, ˆẽ XY,n = X n ˆα n0 ˆα n1 Ỹ n 1 X nỹ n Xn Ỹ n, cov( ˆ X n, Ỹ nẽ Y X,n ) = L 1 n X n L 1 n Ỹ nˆẽ Y X,n and cov(ỹ ˆ n, X nẽ XY,n ), cov( ˆ X n, Ỹ cov( ˆ X n, ẽ Y X,nẽ XY,n ) are defined similarly. n, X nỹ nˆẽ Y X,n nẽ Y X,nẽ XY,n ), nẽ Y X,n), n, X nẽ XY,n), ) L n Ỹ ns 2 2, Ỹ n cov( ˆ X, Ỹ ) = Appendix C: Auxiliary results on martingale differences Lemma 1. Under the technical conditions (C1)-(C6), on events A n (16) and C n (17), the martingale differences Z nt satisfy the conditions n (a.) E{ZntI( Z 2 nt > ɛ)} 0 for all ɛ > 0, t=1 (b.) 2 n = n t=1 Z 2 nt p σ 2 Z for σ 2 Z > 0. 19

21 Proof. Let Z nt = w nt v nt, where w nt = 1/ n, and v nt = aγ 1 {ψ(u nt )X nt E(X)} + a X n(t)ψ(u nt )e Y X,nt (X nt X n(t) ) s 2 X n(t) + b{φ(u nt )X nt E(X)} + cη 1 {φ(u nt )Y nt E(Y )} (Y +cȳ n(t)φ(u nt nt )e Ȳ XY,nt s 2 Y n(t) n(t) ) + d{ψ(u nt )Y nt E(Y )} with E(v nt ) = 0. Since v nt is bounded uniformly in t on event E n, it holds for ɛ > 0 that n t=1 E{Z2 nti( Z nt > ɛ)} = n t=1 x 2 I( x > ɛ)df wntvnt (x) max t x 2 I( x > ɛ/ w nt )df vnt (x) t w2 nt = max t x 2 I( x > nɛ)df vnt (x) 0, and Lemma 1 (a.) follows. The term 2 n in Lemma 1 (b.) is equal to [ 2 n = a 2 γ1 2 n { 1 ψ 2 (U nt )Xnt 2 + {E(X)} 2 2E(X) n }] 1 ψ(u nt )X nt t t [ + b 2 n ] 1 X nt 2 + {E(X)} 2 2E(X) Xn t [ + c 2 η1 2 n { 1 φ 2 (U nt )Ynt 2 + {E(Y )} 2 2E(Y ) n }] 1 φ(u nt )Y nt t t [ + d 2 n 1 Ỹnt 2 + {E(Y )} 2 2E(Y ) Ỹ ] n t + 2abγ 1 [ n 1 t + 2acγ 1 η 1 [ n 1 t ] + E(X)E(Y ) [ + 2adγ 1 n 1 t + 2bcη 1 [ n 1 t [ + 2bd n 1 t + 2cdη 1 [ n 1 t { ψ(u nt )φ(u nt )Xnt 2 E(X) n 1 t { X nt Ỹ nt E(Y ) n 1 t { ψ 2 (U nt )X nt Y nt E(Y ) n 1 t } ] ψ(u nt )X nt E(X) Xn + {E(X)} 2 } { ψ(u nt )X nt E(X) n 1 t { φ 2 (U nt )X nt Y nt E(X) n 1 t X nt Ỹ nt E(Y ) Xn E(X) Ỹ ] n + E(X)E(Y ) { φ(u nt )ψ(u nt )Ynt 2 E(Y ) n 1 t φ(u nt )Y nt } } ψ(u nt )X nt E(X) Ỹ ] n + E(X)E(Y ) } ] φ(u nt )Y nt E(Y ) Xn + E(X)E(Y ) } φ(u nt )Y nt E(Y ) Ỹ ] n + {E(Y )} 2 20

22 [ + 2 n 1 ψ(u nt ) X (X n(t) nt X n(t) ) e s 2 Y X,nt [a 2 γ 1 {ψ(u nt )X nt E(X)} t X n(t) ] + ab{φ(u nt )X nt E(X)} E(Y )(acη 1 + ad)] [ + 2 n (Y 1 φ(u nt )Ȳ n(t) nt Ȳ n(t) ) e s 2 XY,nt [c 2 η 1 {φ(u nt )Y nt E(Y )} t Y n(t) ] + cd{ψ(u nt )Y nt E(Y )} E(X)(acγ 1 + bc)] + 2n 1 t + 2n 1 t + a 2 n 1 t (X nt X n(t) ψ(u nt )Y ) nt X n(t) e s 2 Y X,nt {acη 1 φ(u nt ) + adψ(u nt )} X n(t) (Y φ(u nt )X nt Ȳ n(t) nt Ȳ ψ 2 (U nt ) X 2 n(t) s 2 Y n(t) n(t) ) (X nt X n(t) )2 s 4 X n(t) e XY,nt {acγ 1 ψ(u nt ) + bcφ(u nt )} e 2 Y X,nt + 2acn 1 t + c 2 n 1 t ψ(u nt )φ(u nt )Ȳ n(t) X (X n(t) nt X n(t) )(Y nt Ȳ s 2 X s 2 n(t) Y n(t) (Y φ 2 (U nt )Ȳ n(t) 2 nt Ȳ s 4 Y n(t) n(t) )2 n(t) ) e 2 XY,nt = T T 17. e Y X,nt e XY,nt It follows from the Law of Large Numbers that T T 10 p a 2 γ 2 1[E{ψ 2 (U)}E(X 2 ) {E(X)} 2 ] + b 2 var( X) + d 2 var(ỹ ) + c 2 η 2 1[E{φ 2 (U)}E(Y 2 ) {E(Y )} 2 ] + 2abγ 1 [E{φ(U)ψ(U)}E(X 2 ) {E(X)} 2 ] + (2acγ 1 η 1 + 2bd)[E{φ(U)ψ(U)}E(XY ) E(X)E(Y )] + 2bcη 1 [E{φ 2 (U)}E(XY ) E(X)E(Y )] + 2adγ 1 [E{ψ 2 (U)}E(XY ) E(X)E(Y )] + 2cdη 1 [E{φ(U)ψ(U)}E(Y 2 ) {E(Y )} 2 ]. On event A, E(T 11 U, X, L n ) = 0 and var(t 11 U, X, L n ) = 4n 2 σy 2 X t ψ2 2 (U nt ) X n(t) (X nt X n(t) )2 s 4 X [a 2 γ n(t) 1 {ψ(u nt )X nt E(X)} + ab{φ(u nt )X nt E(X)} E(Y )(acη 1 + ad)] 2 which is O(n 1 ). Thus, E(T 11 ) = 0 and var(t 11 ) = O(n 1 ), implying that T 11 = O p (n 1/2 ) on A. Similarly, it can be shown that T 12 = O p (n 1/2 ) on C. Using Lemma 4 (a,b) of Şentür & Müller (2005b) and the Law of Large Numbers, it follows that T 13 p E(X) var(x) cov(x, Y e Y X)[2acη 1 E{ψ(U)φ(U)} + 2adE{ψ 2 (U)}], T 14 p E(Y ) var(y ) cov(y, Xe XY )[2acγ 1 E{ψ(U)φ(U)} + 2bcE{φ 2 (U)}], 21

23 T 15 p a 2 σ 2 Y X {E(X)} 2 E{ψ 2 (U)}, var(x) T 16 p E(X)E(Y ) 2ac var(x)var(y ) E{ψ(U)φ(U)}cov(X, Y e Y Xe XY ) E(X){E(Y )}2 2ac var(x)var(y ) E{ψ(U)φ(U)}cov(X, e Y Xe XY ), T 17 p c 2 σ 2 XY {E(Y )} 2 E{φ 2 (U)}. var(y ) Thus 2 p n σz 2 = (a, b, c, d)σ(a, b, c, d)t, where Σ 4 4 is as defined in Theorem 1, and Lemma 1 (b.) follows. Appendix D: Analysis of ξ 1 defined in (3) Assuming conditions (C1)-(C6) (see Appendix A), we estimate ρ (Y,X) by a consistent estimate of ρ (ẽỹ U,ẽ XU ), where ẽỹ U and ẽ XU are the errors from the nonparametric regression models Ỹ = E(Ỹ U) + ẽ Ỹ U and X = E( X U) + ẽ XU, respectively. Thus, ẽỹ U = Ỹ E(Ỹ U) = ψ(u){y E(Y )} and ẽ XU = X E( X U) = φ(u){x E(X)}. Therefore, using the population equation for correlation, ρ (ẽỹ U,ẽ XU ) = E(ẽ Ỹ Uẽ XU ) E(ẽỸ U )E(ẽ XU ) var(ẽ XU ) var(ẽỹ U ) = ρ (Y,X) = ξ 1, where as defined in Section 1 is equal to E{ψ(U)φ(U)}/[ E{ψ 2 (U)} E{φ 2 (U)}]. Next, we show that can assume any real value in (0, 1] under suitable conditions. Let {θ 1, θ 2, θ 3,...} be an orthogonal basis of the inner-product space C[a, b], the space of continuous functions on [a, b], using the inner product < g 1, g 2 >= b a g 1(u)g 2 (u)f(u)du, where f( ) represents the density function of U and we choose θ 1 1. Then ψ and φ can be expanded as ψ = i µ iθ i, and φ = i ϑ iθ i, for sets of real numbers µ i, ϑ i. The identifiability conditions imply that µ 1 = ϑ 1 = 1. Assume without loss of generality that for a given set of ϑ i, i 2, µ i = λϑ i for an arbitrary λ 0, and that i 2 ϑ2 i = τ for τ 0, i.e. < φ, φ >= τ + 1. Hence, = (1 + λτ)/( τ + 1 λ 2 τ + 1). For the case of λ = 0, the value of gets arbitrarily close to 0 as τ increases, and = 1 for τ = λ = 1. Thus may assume any real value in the interval (0, 1]. Appendix E: Analysis of ξ 2 Partial correlation of Ỹ and X adusted for U is equivalent to ρ (eỹ U,e XU ), where eỹ U and e XU are the errors from the regression models Ỹ = a 0+a 1 U +eỹ U and X = b 0 +b 1 U +e XU, respectively. Assuming ψ(u) = c 0 + c 1 U and φ(u) = d 0 + d 1 U, for some real numbers 22

24 c 0, c 1, d 0, d 1, we can evaluate eỹ U and e XU, and thus ρ (eỹ U,e XU ). Using the population normal equations for regression, we find a 1 = {E(Ỹ U) E(Ỹ )E(U)}/var(U) = c 1E(Y ), a 0 = E(Ỹ ) a 1E(U) = c 0 E(Y ), b 1 = {E( XU) E( X)E(U)}/var(U) = d 1 E(X), and b 0 = E( X) b 1 E(U) = d 0 E(X). Therefore, eỹ U = ψ(u){y E(Y )}, e XU = φ(u){x E(X)}, and ρ (eỹ U,e XU ) = E(e Ỹ U e XU ) E(eỸ U )E(e XU ) var(e XU ) var(eỹ U ) = ρ (Y,X) = ξ 2. Appendix F: Analysis of ξ 3 in (4) Applying the population equation for correlation and simplifying terms, we find ρ ( Ỹ, X) = {E(Ỹ X) E(Ỹ )E( X)}/{ var( X) var(ỹ )} = ξ 3. Expanding ψ and φ in the same way as in Appendix D, ξ 3 = [(1 + λτ)e(xy ) E(X)E(Y )]/[ E(X 2 )(τ + 1) {E(X)} 2 E(Y 2 )(λ 2 τ + 1) {E(Y )} 2 ]. This quantity can assume any real value in [ 1, 1], since in the special case of τ = 0, ξ 3 = ρ (Y,X). 23

25 Table 1: Estimates and approximate 95% confidence intervals for ρ (HP,CR) adusted for %LS in the Boston house price data set with n = 506. The first three estimates correspond to non-adustment (ρ ( Ỹ, X) ), nonparametric partial correlation (ρ (ẽỹ U,ẽ XU )) and partial correlation (ρ (eỹ U,e XU )). The approximate confidence intervals for these three methods were obtained using Fisher s z-transformation. The fourth estimate was obtained from the Cadcor method adusting for %LS, with the asymptotic intervals (19). Methods Lower B. Estimate Upper B. ˆρ ( Ỹ, X) ˆρ (eỹ U,e XU ) ˆρ (ẽỹ U,ẽ XU ) Cadcor

26 Figure 1: Data (X i, Y i ) (squares), i = 1,..., 400, generated from the underlying bivariate distribution specified in Section 5.2, along with the distorted data ( X i, Ỹi) (crosses). Least squares linear lines fitted to distorted data (dashed) (with estimated correlation ˆρ ( Ỹ, X) = ), and to original data (solid) (with estimated correlation ˆρ (Y,X) = ) are also shown.

27 intercept % LS slope % LS intercept slope % LS % LS Figure 2: Scatter-plots of the estimated regression coefficients ( ˆβ nr1,..., ˆβ nrm ) (12) of linear regressions house price (HP) versus crime rate (CR) for each bin (B n1,..., B nm ) for intercepts (r = 0, top left panel) and slopes (r = 1, top right panel). Similarly, estimated linear regression coefficients (ˆα nr1,..., ˆα nrm ) (13) of linear regressions CR vs. HP for each bin (B n1,..., B nm ) for intercepts (r = 0, bottom left panel) and slopes (r = 1, bottom right panel). Here Ỹ = HP, X = CR and confounding variable U = %LS (lower educational status). Local linear ernel smooths have been fitted through the scatterplots using cross validation bandwidth choices of h = 6, 10, 10, 10, respectively. Sample size is 506, and the number of bins formed is 24.

28 HP CR CR % LS HP correlation % LS % LS Figure 3: Scatter-plots of the variables HP (house price) vs CR (crime rate) (top left panel), CR vs %LS (lower educational status)(top right panel) and HP vs %LS (bottom left panel) for the Boston house price data, along with the scatter-plot of raw correlation estimates (ˆr n1,..., ˆr nm ) (20) per each bin (B n1,..., B nm ) (bottom right panel). Local linear ernel smooths have been fitted through the scatter-plots using cross validation bandwidth choices of h = 25, 15, 15, 10, respectively.

Covariate Adjusted Varying Coefficient Models

Biostatistics Advance Access published October 26, 2005 Covariate Adjusted Varying Coefficient Models Damla Şentürk Department of Statistics, Pennsylvania State University University Park, PA 16802, U.S.A.