SEMIPARAMETRIC ESTIMATION OF CONDITIONAL HETEROSCEDASTICITY VIA SINGLE-INDEX MODELING

Size: px

Start display at page:

Download "SEMIPARAMETRIC ESTIMATION OF CONDITIONAL HETEROSCEDASTICITY VIA SINGLE-INDEX MODELING"

Owen Taylor
5 years ago
Views:

1 Statistica Sinica 3 (013), doi: SEMIPARAMERIC ESIMAION OF CONDIIONAL HEEROSCEDASICIY VIA SINGLE-INDEX MODELING Liping Zhu, Yuexiao Dong and Runze Li Shanghai University of Finance and Economics, emple University and Pennsylvania State University Abstract: We consider a single-index structure to study heteroscedasticity in regression with high-dimensional predictors. A general class of estimating equations is introduced. he resulting estimators remain consistent even when the structure of the variance function is misspecified. he proposed estimators estimate the conditional variance function asymptotically as well as if the conditional mean function was given a priori. Numerical studies confirm our theoretical observations and demonstrate that our proposed estimators have less bias and smaller standard deviation than the existing estimators. Key words and phrases: Conditional variance, heteroscedasticity, single-index model, volatility. 1. Introduction Many scientific studies rely on understanding the local variability of the data, frequently characterized through the conditional variance in statistical modeling. he conditional variance plays an important role in a variety of statistical applications, such as measuring the volatility of risk in finance (Anderson and Lund (1997); Xia, ong, and Li (00)), monitoring the reliability of nonlinear prediction (Müller and Stadtmüller (1987); Yao and ong (1994)), identifying homoscedastic transformations in regression (Box and Cox (1964); Carroll and Ruppert (1988)) and so on. Estimation of the conditional variance function is an important problem in statistics. Let Y R be the response variable and x = (X 1,..., X p ) R p be the associated predictor vector. In this paper we study the conditional variance function of Y given x, denoted var(y x). We write E(Y x) as the conditional mean function and ε = Y E(Y x) as the random error. hen we have var(y x) = E(ε x). Since ε is not observable, it is natural to replace the error with the residual ε = Y Ê(Y x), where Ê(Y x) is an arbitrary consistent estimate of the conditional mean function. It is of interest to quantify the effect of this replacement on estimating the conditional variance function. his problem first received much attention when the predictor is univariate, p = 1. See, for

2 136 LIPING ZHU, YUEXIAO DONG AND RUNZE LI example, Hall and Carroll (1989), Wang et al. (008), and references therein. Cai, Levine, and Wang (009) generalized the results of Wang et al. (008) to the case of multivariate x. In their generalization, however, the nonparametric kernel regression was applied directly to estimate the multivariate regression function, which may not be effective due to the curse of dimensionality. In regressions with univariate predictor and response, Ruppert et al. (1997) and Fan and Yao (1998) applied local linear regression to the squared residuals and demonstrated that such nonparametric estimate performs asymptotically as well as if the conditional mean were given a priori. Song and Yang (009) derived asymptotically exact and conservative confidence bands for the heteroscedastic variance functions. Yin et al. (010) extended the results of Fan and Yao (1998) to the case of multivariate response. o model the conditional variance function when p is fairly large, we assume throughout that there exists a smooth function σ ( ) and a p 1 vector β 0 such that var(y x) = σ (β 0 x). (1.1) With an unspecified link function σ ( ) and a univariate index β 0 x, (1.1) is both flexible and interpretable under the single-index structure. It provides a compromise between parametric models which are easily interpretable yet often too restrictive, and fully nonparametric models that are flexible but suffer from the curse of dimensionality. For ease of presentation we assume that, for some unknown function l( ) and a p 1 vector α 0, the conditional mean function also admits the single-index structure E(Y x) = l(α 0 x). (1.) he assumption of model (1.) is not essential and more general forms of the mean function may be assumed. In this paper we propose a general class of estimating equations to estimate β 0. By correctly specifying E(x β 0 x), the estimator of β 0 is consistent even when the structure of the variance function σ ( ) is misspecified. On the other hand, if the variance function σ ( ) is correctly specified and estimated consistently, our proposed estimator of β 0 is consistent without correctly specifying E(x β 0 x). he estimate of β 0 from the estimating equations possesses an adaptive property: the proposed procedure estimates the conditional variance function as efficiently, asymptotically, as if the conditional mean were given a priori. his is achieved by replacing the error ε = Y E(Y x) in the estimating equation with its corresponding residual ε = Y Ê(Y x).

3 SINGLE-INDEX MODELING FOR CONDIIONAL HEEROSCEDASICIY 137 his paper is organized in the following way. In Section we present the methodology and study its properties under both the population and the sample levels. In Section 3 we examine the finite sample performance of the proposed procedure through simulations and an application to a data set. his paper concludes with a brief discussion in Section 4. All technicalities are given in the Appendix.. A New Procedure and Its heoretical Properties.1. A general class of estimating equations Consider the estimating equation E[ε σ (β 0 x)σ (β 0 x) x] = 0, (.1) where ε = Y E(Y x), and σ ( ) stands for the first order derivative of σ ( ). Here (.1) corresponds to the classical nonlinear least squares estimation of Ichimura (1993) and Härdle, Hall, and Ichimura (1993). A limitation of (.1) is that it requires correct specification of σ ( ) to obtain a consistent estimate for β 0, and this may be troublesome in practice. o address this issue we propose a new class of estimating equations, E [ ε σ (β 0 x) x Ẽ ( x β 0 x )] = 0, (.) where σ (β 0 x) and Ẽ ( x β 0 x ) may be different from σ (β 0 x) and E ( x β 0 x ). When σ (β 0 x) = σ (β 0 x) and Ẽ ( x β 0 x ) = E ( x β 0 x ), (.) has the form E [ ε σ (β 0 x) x E ( x β 0 x )] = 0. (.3) It is not difficult to verify that (.3) produces a consistent estimate of β 0. In other words, β 0 is a solution to E [ ε σ (β x) x E ( x β x )] = 0. An important virtue of (.) is that, as long as one of the functions σ (β 0 x) and Ẽ ( x β 0 x ) is correctly specified, (.) yields a consistent estimate of β 0. In particular, without knowing the exact form of σ ( ), suppose we specify it as σ ( ), which may be different from σ ( ), and impose the working estimating equation E [ ε σ (β 0 x) x E ( x β 0 x )] = 0. (.4) Invoking a conditional expectation, (.4) is equivalent to E ( E [ ε σ (β 0 x) x E ( x β 0 x ) x ]) = 0.

4 138 LIPING ZHU, YUEXIAO DONG AND RUNZE LI o verify that this yields a consistent estimate of the true β 0 at the population level, we recall that E ( ε x ) = σ (β 0 x) under (1.1). hen the left hand side of (.4) can be further reduced as E [ E ε σ (β 0 x) x x E ( x β 0 x )] = E [ E ε σ (β 0 x) β 0 x x E ( x β 0 x )] = E [ ε σ (β 0 x) E x E ( x β 0 x ) β 0 x ], where the last term is obviously 0, indicating that (.4) is able to produce a consistent estimator of β 0. his derivation provides the motivation for (.), as it continues to yield a consistent estimate of β 0 even if σ ( ) is misspecified. As a special case, let σ (β 0 x) = 0 in (.4) and get E [ ε x E(x β 0 x) ] = 0, (.5) which is similar to the estimating equation proposed by Li and Dong (009) to recover the central solution space. In their context they utilize Y instead of ε to estimate the mean function. We generalize the idea of the central solution space method to estimate the variance function. When x satisfies the linearity condition (Li (1991)), (E(x β 0 x) is a linear function of x), then β 0 must be proportional to var(x) 1 cov(x, ε ). his property dramatically simplifies solving (.5) at the sample level. Yet, unlike the response variable Y, the error term ε is not observable. his motivates us to examine the effect of estimating the mean function to obtain the residuals on estimating the variance function. We study this issue in the next section. Aside from this, we have seen that (.5) is a specific member of the general class of estimating equations at (.). On the other hand, correct specification of E(x β 0 x) may be challenging in some situations. Even if all components in E(x β 0 x) can be correctly specified, calculating E(x β 0 x) is rather intensive when x is high dimensional. In order to simplify the calculation, suppose we estimate E(x β 0 x) by Ẽ(x β 0 x). hen (.) becomes [ ε E σ (β 0 x) ] x Ẽ(x β 0 x) = 0, (.6) noting that E ( ε x ) = σ (β 0 x) and conditioning on x. hus (.) continues to yield a consistent estimate of β 0 even if E(x β 0 x) is misspecified. For example, we can set Ẽ(x β 0 x) = 0 and then (.6) becomes E [ ε σ (β 0 x) x ] = 0. (.7) We remark here that (.5) and (.7) are equivalent at the population level by noting that E ε E ( x β 0 x ) = E E ( ε β 0 x ) x = E σ (β 0 x)x under (1.1).

5 SINGLE-INDEX MODELING FOR CONDIIONAL HEEROSCEDASICIY 139 We have shown that the estimating equation at (.) has a desirable robustness property: as long as either σ ( ) or Ẽ(x β 0 x) is correctly specified, the estimator based on the sample version of (.) is consistent. Both parametric and nonparametric methods could be used to model σ ( ) or Ẽ(x β 0 x). For example, Ẽ(x β 0 x) is typically assumed to be a linear function of β 0 x in the dimension reduction literature (Li (1991)). We propose to estimate σ ( ) and Ẽ(x β 0 x) via kernel regression. he theoretical properties of the sample estimates based on kernel regression are investigated in the next section... Asymptotic properties Given independent and identically distributed (x i, Y i ), i = 1,..., n, we discuss sample versions of the proposed estimating equations and their asymptotic properties. Ideally, one may estimate β 0 by solving for β that satisfies n 1/ ε i σ ( β x i )x i Ê(x i β x i ) = 0, (.8) which is the sample counterpart of (.3). In (.8), the quantities σ ( β x i ) and Ê(x i β x i ) are the consistent estimators of σ ( β x i ) and E(x i β x i ), respectively, and can be obtained through the classical kernel regression method. In practice, we have to replace the unobservable error ε i with the residual ε i. o guarantee the consistency of β, we need consistent estimation of ε i, indicating that we have to estimate E(Y x) consistently. Under (1.), E(Y x) = l(α 0 x), we estimate α 0 by solving for α that satisfies n 1/ Y i l( α x i )x i Ê(x i α x i ) = 0, (.9) which is parallel to (.8) with Ê(x i α x i )= n j=1 K h( α x j α x i )x j n j=1 K h( α x j α x i ), l( α x i )= n j=1 K h 1 ( α x j α x i )Y j n j=1 K h 1 ( α x j α x i ), In this K h ( ) = K(./h)/h is the kernel function. h and h 1 are the bandwidths. Next we calculate the residual ε i = Y i l( α x i ), and get our final estimate of β 0 by solving n 1/ ε i σ ( β x i )x i Ê(x i β x i ) = 0. (.10)

6 140 LIPING ZHU, YUEXIAO DONG AND RUNZE LI In (.10) we estimate E(x i β x i ) and σ ( β x i ) as n Ê(x i β j=1 x i )= K h( β x j β x i )x n j n j=1 K h( β x j β, σ ( β j=1 x i )= K h ( β x j β x i ) ε j x i ) n j=1 K h ( β x j β. x i ) o summarize, we implement the following algorithm to estimate β Solve (.9) to obtain α through the following Newton-Raphson algorithm. (a) Start with an initial value α (0). In our implementation, we choose α (0) = n (x i x)(x i x) 1 n (x i x)(y i Y ), where x = n 1 n x i and Y = n 1 n Y i. (b) Write J(α) = n 1/ n Y i l(α x i )x i Ê(x i α x i ) and J (α) = n 1/ n l(α x i ) x i Ê(x i α x i )x i. he derivative l(α x i ) = l(α x i )/ (α x i ) is taken directly from the corresponding kernel estimator. Update α (k) with α (k+1) = α (k) J (α (k) ) 1 J(α (k) ). (.11) In case the matrix J (α (k) ) is singular or nearly so, we adopt a ridge regression approach using (.11) with J (α (k) ) replaced by J r(α (k) ) = J (α (k) )+λ n I p p for some positive ridge parameter λ n. Here I p p denotes a p p identity matrix. (c) Iterate (.11) until α (k+1) fails to change. Denote the resultant estimate by α.. Obtain l( α x i ) by using kernel regression of Y i onto ( α x i ), for i = 1,..., n. 3. Solve (.10) to obtain β, where ε i = Y i l( α x i ). (a) Start with an initial value β (0). In our implementation, we choose β (0) = n (x i x)(x i x) 1 n (x i x) ε i. (b) Write I(β) = n 1/ n ε i σ (β x i )x i Ê(x i β x i ) and I (β) = n 1/ n σ (β x i ) x i Ê(x i β x i )x i, where the derivative σ (β x i ) = σ (β x i )/ (β x i ) is taken from the kernel estimator σ (β x i ). Update β (k) with β (k+1) = β (k) I (β (k) ) + λ n I p p 1 I(β (k) ). (.1) (c) Iterate (.1) until β (k+1) fails to change. Denote the resultant estimate by β. 4. Obtain σ ( β x i ) by using kernel regression of ε i onto ( β x i ), for i = 1,..., n.

7 SINGLE-INDEX MODELING FOR CONDIIONAL HEEROSCEDASICIY 141 One question arises naturally: how to quantify the differences between the estimators obtained from (.8) and (.10)? his is answered by the following. heorem 1. Suppose the conditions (C1) (C5) in the Appendix are satisfied. Let [ Q = E A(x) σ0(β 0 x) ] x and V = E var(ε x)a(x)a (x), ) where A(x) = x E(x β 0 x). hen n 1/ V 1/ Q ( β β0 converges in distribution to the standard multivariate normal distribution as n. heorem 1 provides the asymptotic normality of β obtained from Step 3 in our proposed algorithm, which uses the residual ε i and relies on (.10). As detailed in the proof in the Appendix, we obtain exactly the same asymptotic distribution of β based on (.8) with known error ε i. his means our procedure performs as well as the oracle procedure in terms of estimating β 0, namely, without knowing the conditional mean, we can estimate the conditional variance function asymptotically as well as if the conditional mean was given a priori. With a consistent estimator β, we estimate σ ( ) via kernel regression. he next theorem states the consistency of σ ( ) obtained from Step 4 in our proposed algorithm. heorem. Suppose the conditions (C1) (C5) in the Appendix are satisfied. Let [ σ bias = h (β 0 x) σ (β 0 x) f(β µ + 0 x) ] f(β, 0 x) where µ = 1 1 u K(u)du, σ ( ), f( ) and σ ( ) denote the first and second order derivatives, respectively. hen (nh ) σ 1/ ( β x) σ (β 0 x) bias converges in distribution to normal distribution with mean zero and variance var(ε β 0 x)/f(β 0 x). Because β has a faster convergence rate than the nonparametric regression, the above result is not surprising. heorem implies that we can estimate σ ( ) based on β x as efficiently as if β 0 x was known a priori. his extends the results in Fan and Yao (1998), as we obtain the adaptive property when x is high-dimensional and the link function is an unknown smoothing function. Remark 1. In this section, we describe how to estimate β 0 from the estimating equation (.10) and establish the adaptive property for β and σ ( β x) in an asymptotic sense. In practical implementation, one may choose the bandwidth

8 14 LIPING ZHU, YUEXIAO DONG AND RUNZE LI for the proposed procedure by using a cross-validation procedure. he estimating equation (.10) corresponds to (.3) at the population level. Similarly, one can derives an estimate for β 0 using the sample version of (.5) or (.7). However, it is necessary to undersmooth Ê(x β 0 x) in (.5), or ˆσ (β 0 x) in (.7), in order for the resulting estimate to achieve the adaptive properties. We skip the details. 3. Numerical Studies 3.1. Simulations In this section, we report on simulation studies to compare the performance of the estimation procedures discussed in Section.1. Specifically, we consider n Y i l( α x i ) l( α x i ) x i = 0, n ε i σ ( β x i ) σ ( β (3.1) x i ) x i = 0. Solving (3.1) yields the classical nonlinear least squares estimation proposed by Härdle, Hall, and Ichimura (1993), and it serves as a benchmark for our comparisons. Estimating equations for the sample level of (.7) are n Y i l( α x i )x i = 0, n ε i σ ( β (3.) x i )x i = 0. We recall that the quantity Ẽ(x β 0 x) is misspecified to be 0 in (.7). We have seen in Section.1 that the estimator of β 0 obtained from (3.) remains consistent if ε and σ ( ) are consistent. he estimator based on (3.) is included to demonstrate this robustness property. he sample version of the estimating equation (.3) is n Y i l( α x i )x i Ê(x i α x i ) = 0, n ε i σ ( β x i )x i Ê(x i β (3.3) x i ) = 0. In equations (3.1), (3.), and (3.3), ε i = Y i l( α x i ). As in Section., we estimate l(α x), σ (β x), E(x α x), and E(x β x) through the corresponding kernel estimates. Both σ (β x) and E(x β x) are estimated consistently in (3.3). he Epanechnikov kernel is used in our numerical study and the bandwidths are selected via cross-validation. We remark here that, our estimating procedure is not very sensitive to the choice of the bandwidth, which confirms the theoretical investigations in heorem 1 that the asymptotic normality of β holds true for a wide range of bandwidth. In our simulation, we used two schemes to generate x.

9 SINGLE-INDEX MODELING FOR CONDIIONAL HEEROSCEDASICIY 143 method able 1. Simulation results for Case 1. he oracle estimates use the true error term ε. All numbers reported are multiplied by 100. mean estimate variance estimate oracle estimation α 1 α α 3 α 4 β1 β β8 β1 β β8 (a): l(α 0 x) = 5 ( α 0 x ) (3.1) bias std (3.) bias std (3.3) bias std (b): l(α 0 x) = exp ( α 0 x ) (3.1) bias std (3.) bias std (3.3) bias std Case 1: he predictors x were drawn from a normal population with mean zero and variance-covariance matrix (σ ij ) 8 8, where σ ij = 0.5 i j. Case : We generated X 1 as uniform U(0, 1 1/ ), X from a binomial distribution with success probability 0.5, and X 3 from a Poisson distribution with parameter. We kept (X 4,..., X 8 ) generated from Case 1. Conditioning on x, Y was generated as normal with the mean functions (a) l(α 0 x) = 5 ( α 0 x), (b) l(α 0 x) = exp ( α 0 x), where α 0 = (0.8, 0.4, 0.4, 0., 0, 0, 0, 0) in (a) and (b) and the variance function σ (β 0 x) = 0.5 ( β 0 x + ) with β0 = ( 0.45, 0,..., 0, 0.9). Performances of estimating α 0 and β 0. For the estimation accuracy of α 0 and β 0, simulations were repeated 1,000 times with sample size n = 600. he bias ( bias ) and the standard deviation ( std ) of the estimates of typical elements of α 0 and β 0 are reported in ables 1 and for Cases 1 and, respectively. he estimating equations (3.1), (3.), and (3.3) have similar performances for estimating α in terms of both the bias and the standard deviation. he estimating equations (3.3) perform the best for estimating β 0, and (3.) perform only slightly worse, confirming the robustness property of (.). he estimating equations (3.1) have the largest biases and standard deviations; this may be

10 144 LIPING ZHU, YUEXIAO DONG AND RUNZE LI able. Simulation results for Case. he oracle estimates use the true error term ε. All numbers reported are multiplied by 100. method mean estimate variance estimate oracle estimation α 1 α α 3 α 4 β1 β β8 β1 β β8 (a): l(α 0 x) = 5 ( α 0 x ) (3.1) bias std (3.) bias std (3.3) bias std (b): l(α 0 x) = exp ( α 0 x ) (3.1) bias std (3.) bias std (3.3) bias std able 3. Simulation results of the average squared errors (ASE). All numbers reported are multiplied by 100. model method Case 1 Case ASE( α) ASE( β) ASE( α) ASE( β) (3.1) (a) (3.) (3.3) (3.1) (b) (3.) (3.3) explained by the fact that estimation of l( ) and σ ( ) are involved. From ables 1 and, all three procedures are close to their corresponding oracle estimates, which adopt the true errors instead of the residuals. his serves to confirm the adaptive property of the proposed estimators. Performances of estimating l( ) and σ ( ). We take the average squared errors criteria ASE( α) = n l( 1 α x i ) l(α 0 x i ), ASE( β) = n 1 σ ( β x i ) σ (β 0 x i ).

11 SINGLE-INDEX MODELING FOR CONDIIONAL HEEROSCEDASICIY 145 he results are reported in able 3. he estimating equations (3.) and (3.3) offer similar results, and both are superior to the results based on (3.1). 3.. An application We consider the horse mussels data collected in the Malborough Sounds at the Northeast of New Zealands South Island (Camden (1989)). he response variable Y is the shell mass, with three quantitative predictors as related characteristics of mussel shells: the shell length X 1, the shell height X, and the shell width X 3, all in mm. he sample size of this data set is 01. All predictors in this analysis are standardized marginally to have zero mean and unit variance. We only report the results obtained from (3.3), as similar results are obtained from (3.1) and (3.). By assuming (1.), we estimate α 0 by the first equation of (3.3), and find α = (0.3615, 0.443, 0.830). Based on ( α x i, Y i ), i = 1,..., n, we estimate l(α 0 x) by kernel regression. he estimated regression function and its point-wise prediction interval are plotted in Figure 1(A). he prediction interval at the 95% level is calculated by assuming tentatively that the variance function is a constant. We can clearly see that the empirical converge probability is very poor, particularly when α x is large, only 63 among 01 points lie within this prediction region. Homoscedasticity is not a reasonable assumption for this data set. Next we solve the second of the estimating equations (3.3) and find β = ( 0.379, 0.374, ). he variance function is then estimated by kernel regression based on data points ( β x i, ε i ), i = 1,..., n. he estimated variance function is plotted in Figure 1(B), which does not seem to be constant. aking into account the heteroscedasticity, we report the 95% prediction interval of l( β x) in Figure 1(C). We can see that, around 94% of the sample (189 points) is covered by this prediction interval. It is of practical interest to examine the adaptive property of our proposal by using the bootstrap method. Based on the original sample, we obtained the estimates of the index parameters α and β and the link function l( ). We then bootstrapped the original data 1,000 times. hree quantities can be obtained from the bootstrap sample: α denotes the estimator of α 0 based on the bootstrap sample (x i, Y i ); β denotes the estimator of β 0 based on x i, Y i l (x i α ), and β o denotes the estimator of β 0 based on x i, Y i l(x i α). We remark here that β o differs from β in that the former used the true error term because it adopted α and l( ), estimated from the original data, while the latter used the residuals calculated from the bootstrap sample. he third plot in Figure 1(D) gives the boxplot of the absolute values of correlation coefficients corr(x β, x β0 ). It implies that β behaves very similarly to β 0 because the correlation coefficients

12 146 LIPING ZHU, YUEXIAO DONG AND RUNZE LI Figure 1. Analysis of the Horse Mussel Data. he dashed line in (A) is the kernel estimate l( α x). he dot-dash lines are the 95% pointwise confidence intervals obtained under the homoscedasticity assumption. he dashed line in (B) is the kernel estimate σ ( β x). he dashed line in (C) is the kernel estimate l( α x). he dot-dash lines are the 95% pointwise confidence intervals obtained under the heteroscedasticity assumption. (D) depicts the boxplots of corr(x α, x α), corr(x β, x β), and corr(x β, x β 0 ), as calculated from the bootstrap samples. are all very large. he first and second plots in Figure 1(D) show the respective boxplots of the absolute value of the correlation coefficients corr(x α, x α) and corr(x β, x β). It can be seen that α performs much more stably than β and β 0, indicating that estimating the conditional variance function is more difficult than estimating the conditional mean function. 4. Discussion Estimation of the conditional heteroscedasticity remains an important and open problem in the literature when the dimension of predictors is very large (Antoniadis, Grégoire, and Mckeague (004)). Based on a single-index structure,

13 SINGLE-INDEX MODELING FOR CONDIIONAL HEEROSCEDASICIY 147 this paper offers a general class of estimating equations to estimate the conditional heteroscedasticity. he proposed framework allows for flexibility when the structure of the conditional variance function is unknown. he resulting estimator enjoys an adaptive property in that it performs as well as if the true mean function was given. For ease of illustration, we assume that both the conditional mean and the conditional variance functions are of the single-index structure. Extension of the proposed methodology to the multi-index conditional mean and conditional variance models warrants future investigation. As pointed out by an anonymous referee, the model considered for ε is of the form ε = σ(β 0 x)ϵ. hus, one can estimate β 0 by considering ε α = σ α (β 0 x)e ϵ α + ξ for any α > 0, where ξ = σ α (β 0 x)( ϵ α E ϵ α ). he model used in this paper corresponds to α =. Mercurio and Spokony (004) discussed how to determine α, and suggested α = 1/ in general. It may be of interest to consider other α than α =. his is beyond the scope of this paper. Acknowledgement Zhu s research was supported by National Natural Science Foundation of China (NNSFC) , the National Institute on Drug Abuse (NIDA) grant R1-DA0460, Innovation Program of Shanghai Municipal Education Commission 13ZZ055 and Pujiang Project of Science and echnology Commission of Shanghai Municipality 1PJ Dong s research was supported by National Science Foundation Grant DMS Runze Li is the corresponding author and his research was supported in part by NNSFC grants and , NIDA grant P50-DA10075 and NCI grant R01 CA he content is solely the responsibility of the authors and does not necessarily represent the official views of the NIDA, NCI or the National Institutes of Health. Appendix: echnical Conditions and Proofs Appendix A: Some technical conditions (C1) he density functions of (α 0 x) and (β 0 x), denoted by f α0 (α 0 x) and f β0 (β 0 x), are continuous and bounded from zero and above for all x R p, and have locally Lipschitz second derivatives. (C) he functions l(α 0 x), σ (β 0 x), l(α 0 x)f α 0 (α 0 x), σ (β 0 x)f β0 (β 0 x), E(x α 0 x)f α 0 (α 0 x), and E(x β 0 x)f β0 (β 0 x) are continuous and bounded from above for all x R p, and their third derivatives are locally Lipschitz. (C3) he symmetric kernel function K( ) is twice continuously differentiable with support [-1,1], and is Lipschitz continuous.

14 148 LIPING ZHU, YUEXIAO DONG AND RUNZE LI (C4) he bandwidths h, h 1, and h satisfy nh 4 and nh 8 0, nh 4 i and nh 8 i 0 for i = 1 and. (C5) E(Y 4 ) < and E(Xi 4 ) < for i = 1,..., p. In addition, the conditional variance of Y given x is bounded away from 0 and from above. Appendix B: Proofs of heorems 1 and Proof of heorem 1. With regard to l( α x), σ ( β x), and Ê(x β x), we first quantify the extent to which these functions can approximate their respective true values. We take l( α x) as an example. Note that l( α x) l(α 0 x) = l(α x) l(α 0 x) + l( α x) l(α 0 x). he first term in the right hand side can be dealt with using standard nonparametric regression theory. herefore, it remains to quantify the difference l( α x) l(α 0 x). Let C = α : α α 0 Cn 1/. Recall that Li, Zhu, and Zhu (011, Lemma 1) proved that, if nh 4 1, sup l( α x) l(α 0 x) E l( α x) l(α log n 0 x) = Op. sup x R p C By invoking the symmetry of the kernel function and the condition that the third derivative of l( )f( ) is local Lipschitz, and using similar arguments as those in proving Lemma 3.3 of Zhu and Fang (1996), we can show that E l(α 0 x) l(α 0 x) µ h 1 l (α 0 x) + l (α 0 x)f (α 0 x) f(α 0 x) nh 1 = O(h 4 ), where µ = 1 1 u K(u)du. A similar result also holds when α 0 is replaced by α. By the Mean Value heorem, it follows that ( ) h sup l ( α x) l (α 0 x) h 1µ = O 1 p C n 1/ = O(h 4 1), since nh 4 1. he above arguments imply that, for any fixed x Rp and nh 8 1 0, l(α 0 x) l( α x) = l (α x)x (α 0 α) + o p (n 1/ ). (B.1) Following similar arguments, we can obtain that σ 0(β 0 x) σ 0( β x) = σ0(β 0 x) ( x β 0 β ) + o p (n 1/ ), and

15 SINGLE-INDEX MODELING FOR CONDIIONAL HEEROSCEDASICIY 149 Ê(x β 0 x) Ê(x β x) = E(x β 0 x) ( x β 0 β ) + o p (n 1/ ). (B.) (B.1) and (B.) are used repetitively in subsequent derivations. We discuss the consistency of β first. Let I(β) = E ε σ (β x) x E(x β x), Î(β) = n 1 E ε i σ (β x i ) x i Ê(x i β x i ). Let U(β 0 ) be any open set that includes β 0. o prove the consistency of β, we assume that inf β Θ\U(β 0 ) I(β) δ for some positive constant δ. It suffices to show that P r inf β Θ\U(β Î(β) δ 0. (B.3) 0 ) he condition inf β Θ\U(β 0 ) I(β) δ implies that P r inf β Θ\U(β Î(β) δ 0 ) P r inf β Θ\U(β 0 ) herefore, it suffices to show that, for any fixed δ, P r Î(β) I(β) δ 0. sup β Θ Î(β) I(β) δ def Recall the definition of Î(β) and I(β), and let Î(β) I(β) = 6 I 1,i, where I 1,1 = n 1 ε i σ (β x i ) x i E(x i β x i ) I(β), I 1, = n 1 I 1,3 = n 1 I 1,4 = n 1 I 1,5 = n 1 I 1,6 = n 1 ε i ε i xi E(x i β x i ), σ (β x i ) σ (β x i ) x i E(x i β x i ), ε i σ (β x i ) E(x i β x i ) Ê(x i β x i ), ε i ε i E(x i β x i ) Ê(x i β x i ), σ (β x i ) σ (β x i ) E(x i β x i ) Ê(x i β x i )..

16 150 LIPING ZHU, YUEXIAO DONG AND RUNZE LI We note that I 1,1 is an average of independent and identically distributed random variables with mean zero. he classical theory of empirical process shows that sup β Θ I 1,1 = o p (log n/n 1/ ) almost surely (Pollard (1984, hm..37)). Write I 1, = n 1 +n 1 [ l(α 0 x i ) l( α x i ) ε i xi E(x i β x i ) ] [ l(α 0 x i ) l( α x i ) xi E(x i β x i ) ] = I 1,,1 + I 1,,. We first prove that sup β Θ I 1,,1 = o p (1). Note that I 1,,1 sup l(α 0 x) l(α 0 x) n 1 x R p εi xi E(x i β x i ) +n 1 l(α 0 x i ) l( α x i ) εi xi E(x i β x i ) sup l(α 0 x) l(α 0 x) n 1 x R p εi xi E(x i β x i ) +n 1 ε i l (α 0 x i ) x i E(x i β x i ) x i α 0 α. herefore, sup β Θ I 1,,1 = o p (1) follows immediately from the consistency of α and the uniform consistency of l(α 0 x). Similarly, sup β Θ I 1,, = o p (1) can be proved. hese two results imply that sup β Θ I 1, = o p (1). Using the U- statistic theory (Serfling (1980)), we can obtain that sup β Θ I 1,i = o p (1), i = 3, 4. Following similar arguments to deal with I 1,, we have sup β Θ I 1,5 = o p (1). hat sup β Θ I 1,6 = o p (1) follows directly from the Cauchy-Schwartz inequality and standard results on the uniform convergence of Ê(x β x) and σ (β x) in nonparametric regression. By combining these results, it follows that P r sup Î(β) I(β) δ 0 β Θ for any fixed δ, which completes the proof of (B.3). Next we examine the root-n consistency of β. We expand the estimating equation as 0 = n 1/ n x i Ê(x i β x i ) ε i σ ( β x i ) =: 1 I,i. he terms I,i s are explicitly defined in the sequel.

17 SINGLE-INDEX MODELING FOR CONDIIONAL HEEROSCEDASICIY 151 By the Central Limit heorem, I,1 =: n 1/ xi E(x i β 0 x i ) ε i σ (β 0 x i ) = O p (1). Invoking the root-n consistency of α, we can show that I, =: n 1/ xi E(x i β 0 x i ) ε i ε i = op (1). Invoking (B.), it follows that [ I,3 = E σ (β 0 x) x E(x β 0 x) ] ( x + o p (1) n 1/ β 0 β ). Because E x i E(x i β 0 x i ) β 0 x i = 0, U-statistic theory implies I,4 =: n 1/ I,5 =: n 1/ xi E(x i β 0 x i ) σ (β 0 x) σ (β 0 x) = o p (1), E(x i β 0 x i ) Ê(x i β 0 x i ) ε i σ (β 0 x) = o p (1). By the uniform convergence of Ê(x i β 0 x i ), and (B.1), it is easy to see I,6 =: n 1/ I,7 =: n 1/ E(x i β 0 x i ) Ê(x i β 0 x i ) ε i ε i = op (1), E(x i β 0 x i ) Ê(x i β 0 x i ) σ (β 0 x) σ ( β x) = o p (1). Given nh 4 h 4 0 and nhh, the Cauchy-Schwartz inequality implies I,8 =: n 1/ E(x i β 0 x i ) Ê(x i β 0 x i ) σ (β 0 x) σ (β 0 x) = 0. By using the fact that E ε i σ (β 0 x) x = 0, we can show that I,9 =: n 1/ I,10 =: n 1/ ( = o p (1)n 1/ β 0 β ). Ê(xi β 0 x i ) Ê(x i β ε x i ) i σ (β 0 x), Ê(xi β 0 x i ) Ê(x i β x i ) ε i ε i

18 15 LIPING ZHU, YUEXIAO DONG AND RUNZE LI Invoking (B.) again, we have I,11 =: n 1/ Ê(xi β 0 x i ) Ê(x i β x i ) σ (β 0 x) σ ( β x) ( = o p (1)n 1/ β 0 β ), I,1 =: n 1/ Ê(xi β 0 x i ) Ê(x i β σ x i ) (β 0 x) σ (β 0 x) ( = o p (1)n 1/ β 0 β ). ( By combining these results, we obtain that β 0 β ) = O p (n 1/ ). he asymptotic normality follows immediately from the root-n consistency described above and the Central Limit heorem. hat is, [ x E E(x β 0 x) σ0(β 0 x) ] ( β ) x n 1/ β0 = n 1/ xi E(x β 0 x i ) ε i σ (β 0 x) + o p (1), which completes the proof of heorem 1. Proof of heorem. It can be proved, following (B.), the root-n consistency of α and β, and standard arguments in nonparametric regression. See, for example, Ruppert and Wand (1994). Details are omitted from here. Appendix C: Some preliminary results We discuss the consistency of α, its convergence rate, and its asymptotic normality. First we prove the consistency of α. ake J(α) = E Y l(α x) x E(x α x), Ĵ(α) = n 1 Y i l(α x i ) x i Ê(x i α x i ). Let U(α 0 ) be any open set that includes α 0. o prove the consistency of α, we assume that inf α Θ\U(α0 ) J(α) δ for some positive constant δ. It suffices to show P r inf Ĵ(α) δ 0. (C.1) α Θ\U(α 0 ) he condition inf α Θ\U(α0 ) J(α) δ implies that P r inf Ĵ(α) δ P r α Θ\U(α 0 ) inf α Θ\U(α 0 ) Ĵ(α) J(α) δ.

19 SINGLE-INDEX MODELING FOR CONDIIONAL HEEROSCEDASICIY 153 herefore, it suffices to show that P r sup Ĵ(α) J(α) δ 0 α Θ for any fixed δ. his follows directly from the consistency of Ĵ(α) if h 1 0 and nh 1 / log n. See, for example, Ichimura (1993, Lemmas ), Zhu and Fang (1996, Lemmas 3.1 and 3.3), and Pollard (1984, hm..37). (C.1) is proved. Next we prove the root-n consistency of α. With probability close to 1, by a aylor expansion, for α between α 0 and α, 0 = Ĵ( α) = Ĵ(α 0) + Ĵ ( α) ( α α 0 ) = Ĵ(α 0) + Ĵ ( α) Ĵ (α 0 ) ( α α 0 ) + Ĵ (α 0 ) ( α α 0 ). (C.) Recall the definition of Ĵ( α) and write Ĵ(α 0) def = 4 Ĵi, where Ĵ 1 = n 1 Ĵ = n 1 Ĵ 3 = n 1 Ĵ 4 = n 1 Yi l(α 0 x i ) x i E(x i α 0 x i ), Yi l(α 0 x i ) E(x i α 0 x i ) Ê(x i α 0 x i ), l(α 0 x i ) l(α 0 x i ) xi E(x i α 0 x i ), l(α 0 x i ) l(α 0 x i ) E(x i α 0 x i ) Ê(x i α 0 x i ). Because E [ Y l(α 0 x) x E(x α 0 x)] = 0, Ĵ 1 is of the order O p (n 1/ ) by the Central Limit heorem. Both Ĵ and Ĵ3 are of the order o p (n 1/ ) according to the standard U-statistics theory (Serfling (1980)); by using Cauchy- Schwartz inequality, Ĵ4 is less than [ Ê l(α 0 x) l(α ] 1/ [ ] 1/ 0 x) E E(x α 0 x) Ê(x α 0 x), which is of order O p h log n/(nh 1 ). If nh and nh 1 / log n, then the last term is also o p (n 1/ ). Combining these results, we obtained that Ĵ(α 0 ) = O p (n 1/ ). By using the Weak Law of Large Numbers, we can see that Ĵ (α 0 ) converges in probability to J (α 0 ) = E [ l (α 0 x) x E(x α 0 x) x ] for nh 1 and

20 154 LIPING ZHU, YUEXIAO DONG AND RUNZE LI nh Similarly, Ĵ ( α) Ĵ (α 0 ) = o p (1) in that α is a consistent estimate of α 0. herefore, α is a root-n consistent estimate of α 0. he asymptotic normality of α follows from its root-n consistency. Specifically, (C.) implies that n 1/ ( α α 0 ) = J (α 0 ) n 1/ ε i xi E(x i α x i ) + o p (1) where J (α 0 ) = E [ l (α 0 x) x E(x α 0 x) x ] and ε i = Y i l(α 0 x i). References Anderson,. G. and Lund, J. (1997). Estimating continuous time stochastic volatility models of the short term interest rate. J. Econometrics 77, Antoniadis, A., Grégoire, G. and Mckeague, I. (004). Bayesian estimation in single-index models. Statist. Sinica 14, Box, G. E. P and Cox, D. R. (1964). An analysis of transformations (with discussion). J. Roy. Statist. Soc. Ser. B 6, Cai,., Levine, M. and Wang, L. (009). Variance function estimation in multivariate nonparametric regression with fixed design. J. Multivariate Anal. 100, Camden, M. (1989). he Data Bundle. New Zealand Statistical Association, WeHington, New Zealand. Carroll, R. J. and Ruppert, D. (1988). ransformations and Weighting in Regression. Chapman & Hall, London. Fan, J. and Yao, Q. (1998). Efficient estimation of conditional variance functions in stochastic regression. Biometrika 85, Hall, P. and Carroll, R. J. (1989). Variance function estimating in regression: the effect of estimating the mean. J. Roy. Statist. Soc. Ser. B 51, Härdle, W., Hall, P. and Ichimura, H. (1993). Optimal smoothing in single-index models. Ann. Statist. 1, Ichimura, H. (1993). Semiparametric least squares (SLS) and weighted SLS estimation of singleindex models. J. Econometrics 58, Li, B. and Dong, Y. (009). Dimension reduction for non-elliptically distributed predictors. Ann. Statist. 37, Li, K.-C. (1991), Sliced inverse regression for dimension reduction. J. Amer. Statist. Assoc. 86, Li, L. X., Zhu, L. P. and Zhu, L. X. (011). Inference on primary parameter of interest with aid of dimension reduction estimation. J. Roy. Statist. Soc. Ser. B 73, Mercurio, D. and Spokoiny, V. (004). Statistical inference for time-inhomogeneous volatility models. Ann. Statist. 3, Müller, H.-G. and Stadtmüller, U. (1987). Estimation of heteroscedasticity in regression analysis. Ann. Statist. 15, Pollard, D. (1984). Convergence of Stochastic Processes. Springer, New York. Ruppert, D. and Wand, M. P. (1994). Multivariate locally weighted least squares regression. Ann. Statist.,

21 SINGLE-INDEX MODELING FOR CONDIIONAL HEEROSCEDASICIY 155 Ruppert, D., Wand, M., Holst, U. and Hössjer, O (1997). Local polynomial variance function estimation. echnometrics 39, Serfling, R. J. (1980). Approximation heorems of Mathematical Statistics. John Wiley, New York. Song, Q. and Yang, L. (009). Spline confidence bands for variance function. J. Nonparametr. Statist. 1, Wang, L., Brown, L. D., Cai,. and Levine, M. (008). Effect of mean on variance function estimation on nonparametric regression. Ann. Statist. 36, Xia, Y., ong, H. and Li, W. K. (00). Single-index volatility models and estimation. Statist. Sinica 1, Yao, Q. and ong, H. (1994). Quantifying the influence of initial values on nonlinear prediction. J. Roy. Statist. Soc. Ser. B 56, Yin, J., Geng, Z., Li, R. and Wang, H. (010). Nonparametric covariance model. Statist. Sinica 0, Zhu, L. X. and Fang, K.. (1996). Asymptotics for kernel estimate of sliced inverse regression. Ann. Statist. 4, School of Statistics and Management and the Key Laboratory of Mathematical Economics, Ministry of Education, Shanghai University of Finance and Economics, Shanghai 00433, P. R. China. zhu.liping@mail.shufe.edu.cn Department of Statistics, emple University, Philadelphia, PA 191, U. S. A. ydong@temple.edu Department of Statistics and he Methodology Center, Pennsylvania State University, University Park, PA , U.S.A. rli@stat.psu.edu (Received March 01; accepted August 01)

SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES. Liping Zhu, Mian Huang, & Runze Li. The Pennsylvania State University

SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES Liping Zhu, Mian Huang, & Runze Li The Pennsylvania State University Technical Report Series #10-104 College of Health and Human Development