Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 1 / 32

Outline 1 Introduction Motivation Possible Models 2 Single Index Quantile Regression The Proposed Estimator Main Results 3 Numerical Studies Example 1 Example 2 4 Conclusions 5 Future Work E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 2 / 32

Introduction Motivation Least Squares Regression: Models the relationship between a d-dimensional vector of covariates X and the conditional mean of the response Y given X = x. Least Squares Estimator: 1 provides only a single summary measure for the conditional distribution of the response given the covariates. 2 sensitive to outliers and can provide a very poor estimator in many non-gaussian and especially long-tailed distributions. E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 4 / 32

Introduction Motivation Quantile Regression (QR): Models the relationship between a d-dimensional vector of covariates X and the τ th conditional quantile of the response Y given X = x, Q τ (Y x). Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the covariates. 2 useful for modeling data with heterogeneous conditional distributions, especially data where extremes are important. E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 5 / 32

Introduction Possible Models Linear QR: Q τ (Y x) = β x, for a d-dimensional vector of unknown parameters β. Disadvantage: linear assumption not always valid. E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 7 / 32

Introduction Possible Models Linear QR: Q τ (Y x) = β x, for a d-dimensional vector of unknown parameters β. Disadvantage: linear assumption not always valid. Nonparametric QR: Q τ (Y x) = g(x), for an unspecified function g : R d R. Disadvantage: meets the data sparseness problem for high dimensional data. Single-Index QR: Q τ (Y x) = g(β x β), depends on x only through the single linear combination β x. Advantage: reduces the dimension and maintain some nonparametric flexibility. E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 7 / 32

The Proposed Estimator The Model Let {Y i, X i } n i=1 be independent and identically distributed (iid) observations that satisfy Y i = Q τ (Y X i ) + ɛ i, where Q τ (Y x) = Q τ (Y β 1x) and the error term satisfies Q τ (ɛ i x) = 0. For identifiability, we impose certain conditions on β 1 : the most common one: β 1 = 1 with its first coordinate positive. the one adopted here: β 1 = (1, β ). E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 9 / 32

The Proposed Estimator The Model The true parametric vector β satisfies β = arg min b E ( ρ τ (Y Q τ (Y b 1X)) ), where b 1 = (1, b ) and ρ τ (u) = (τ I (u < 0))u. E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 10 / 32

The Proposed Estimator Iterative algorithms As in the single index mean regression (SIMR) problem, the unknown Q τ (Y b 1X) must be replaced with an estimator. Unlike the SIMR problem, however, there is no closed form expression for the estimator of Q τ (Y b 1 X i), and this has led to iterative algorithms for estimating β (Wu, Yu, and Yu, 2010; Kong and Xia, 2012). E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 11 / 32

The Proposed Estimator The New Approach For any given b R d 1, define the function g(u b) : R R as where b 1 = (1, b ). g(u b) = E(Q τ (Y X) b 1X = u), E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 12 / 32

The Proposed Estimator The New Approach For any given b R d 1, define the function g(u b) : R R as where b 1 = (1, b ). Noting that β also satisfies where b 1 = (1, b ). g(u b) = E(Q τ (Y X) b 1X = u), g(β 1X β) = Q τ (Y X), β = arg min b E ( ρ τ (Y g(b 1X b)) ), (1) E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 12 / 32

The Proposed Estimator The New Approach The sample level version of (1) consists of minimizing S n (τ, b) = n ρ τ (Y i g(b 1X i b)). i=1 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 13 / 32

The Proposed Estimator The New Approach The sample level version of (1) consists of minimizing S n (τ, b) = n ρ τ (Y i g(b 1X i b)). i=1 Again, g( b) is unknown but it can be estimated, in a non-iterative fashion, by first obtaining estimators Q τ (Y X i ), for i = 1,..., n, and forming the Nadaraya-Watson-type estimator ĝ NW ˆQ (t b) = n i=1 ( ) t b Q τ (Y X i )K 1 X i 2 h 2 ( ) n k=1 K t b, (2) 1 X k 2 h 2 where K 2 ( ) is a univariate kernel function and h 2 is a bandwidth. E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 13 / 32

The Proposed Estimator The New Approach For the estimator Q τ (Y X i ), we use the local linear conditional quantile estimator (Guerre and Sabbah, 2012). Specifically, for a multivariate kernel function K 1 (x) = K 1 (x 1,..., x d ) and a univariate bandwidth h 1, let L n ((α 0, α 1 ); τ, h 1, x) = 1 nh d 1 n i=1 ( ) ρ τ (Y i α 0 α Xi x 1(X i x))k 1 h 1 and define Q τ (Y x) as α 0 (τ; h 1, x), where α 0 (τ; h 1, x) is defined through ( α 0 (τ; h 1, x), α 1 (τ; h 1, x)) = arg min (α 0,α 1 ) L n((α 0, α 1 ); τ, h 1, x). E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 14 / 32

The Proposed Estimator The proposed estimator is obtained by β = arg min b Θ n ( ) ρ τ Y i ĝ NW (b 1X i b), (3) i=1 where Θ R d 1 is a compact set, β is in the interior of Θ. ˆQ E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 15 / 32

Main Results Proposition 1 Let ĝ NW (t b) be as defined in (2).Under some regularity conditions, we ˆQ have sup ĝ NW Q (t b) g(t b) ( = O p a n + a n + h2) 2, b Θ,t T b where a n = (log n/n) s/(2s+d) and a n = (log n/(nh 2 )) 1/2. E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 17 / 32

Main Results Proposition 1 Let ĝ NW (t b) be as defined in (2).Under some regularity conditions, we ˆQ have sup ĝ NW Q (t b) g(t b) ( = O p a n + a n + h2) 2, b Θ,t T b where a n = (log n/n) s/(2s+d) and a n = (log n/(nh 2 )) 1/2. Proposition 2 Let β be as defined in (3). Under regularity conditions, β is n-consistent estimator of β. E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 17 / 32

Main Results Theorem Let β be as defined in (3). Under regularity conditions, n( β β) d N ( 0, τ(1 τ)v 1 ΣV 1), where V = E ( (g (β 1X β)) 2 (X 1 E(X 1 β 1X))(X 1 E(X 1 β 1X)) f ɛ X (0 X) ), and Σ = E ( (g (β 1X β)) 2 (X 1 E(X 1 β 1X))(X 1 E(X 1 β 1X)) ). E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 18 / 32

Numerical Studies Notation NWQR - the proposed estimator. WYY - the estimator resulting from the iterative algorithm proposed by Wu et al. (2010). WYY-1 - the estimator resulting from dividing the estimator of Wu et al. (2010) by its first component. WYY-2 - the estimator resulting from using the proposed constraint in conjunction with the iterative algorithm of Wu et al. (2010). E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 20 / 32

Numerical Studies Example 1 Consider the model Y = exp(β 1X) + ɛ, (4) where X = (X 1, X 2 ), X i N(0, 1) are iid, β 1 = (1, 2) and the residual ɛ follows a standard normal distribution. E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 21 / 32

Numerical Studies Example 1 Consider the model Y = exp(β 1X) + ɛ, (4) where X = (X 1, X 2 ), X i N(0, 1) are iid, β 1 = (1, 2) and the residual ɛ follows a standard normal distribution. We fit the single index quantile regression model for five different quantile levels, τ = 0.1, 0.25, 0.5, 0.75, 0.9. E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 21 / 32

Numerical Studies Example 1 Consider the model Y = exp(β 1X) + ɛ, (4) where X = (X 1, X 2 ), X i N(0, 1) are iid, β 1 = (1, 2) and the residual ɛ follows a standard normal distribution. We fit the single index quantile regression model for five different quantile levels, τ = 0.1, 0.25, 0.5, 0.75, 0.9. We use sample size of n = 400 and perform N = 100 replications. E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 21 / 32

Numerical Studies Example 1 For comparison of the above estimators, consider the mean square error, R( β), and the mean check based absolute residuals, R τ ( β 1 ), which are defined as R( β) = 1 N N ( β i 2) 2, R τ ( β 1 ) = 1 n i=1 n ρ τ (Y i i=1 Q LL τ (Y β 1X i )), (5) where Q LL τ (Y β 1x) is the local linear conditional quantile estimator on the univariate variable β 1X i. E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 22 / 32

Table 1: mean values and standard errors (in parenthesis), R( β) and R τ ( β 1 ) for Model (4). 95% coverage probability for NWQR, WYY-2 and the second estimated coefficient of Wu et al. (2010). E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 23 / 32 Numerical Studies Example 1 τ 0.1 0.25 0.5 0.75 0.9 NWQR 2.0085 2.0132 2.0149 2.0290 2.0768 (0.0603) (0.0477) (0.0481) (0.0683) (0.3905) β 1.9521 1.9890 3.4450 1.9363 1.9592 WYY-1 (0.2521) (0.2252) (3.8756) (0.3246) (0.5906) WYY-2 1.9699 1.9899 2.0158 2.0525 2.0287 (0.1593) (0.1307) (0.3937) (0.6735) (0.7240) NWQR 0.0037 0.0024 0.0025 0.0055 0.1569 R( β) WYY-1 0.0652 0.0503 192.6956 0.1084 0.3469 WYY-2 0.0260 0.0170 0.1537 0.4518 0.5197 NWQR 0.1682 0.3096 0.3909 0.3113 0.1761 R τ ( β 1 ) WYY-1 0.1783 0.3273 0.4154 0.3479 0.2265 WYY-2 0.1720 0.3160 0.4031 0.3310 0.2050 coverage NWQR 0.99 0.97 0.97 0.97 0.95 prob. WYY-2 0.93 0.90 0.92 0.80 0.70 WYY 0.55 0.76 0.86 0.66 0.28

Numerical Studies Boston Housing Data 506 observations. 14 variables, response: medv, the median value of owner-occupied homes in $1000 s, predictors: measurements on the 506 census tracts in suburban Boston from the 1970 census. available in the MASS library in R. consider the four covariates: RM, TAX, PTRATIO, LSTAT (Opsomer and Ruppert 1998; Yu and Lu 2004; Wu et al. 2010). E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 25 / 32

Numerical Studies Boston Housing Data center the dependent variable. take logarithmic transformations on TAX and LSTAT. standardize RM, log(tax), PTRATIO and log(lstat) and denote them by X 1, X 2, X 3 and X 4. consider the single index quantile regression models: Q τ (medv X) = g(x 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 β), (6) for τ = 0.1, 0.25, 0.5, 0.75, 0.9. E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 26 / 32

Numerical Studies Boston Housing Data τ RM log(tax) PTRATIO log(lstat) 0.1 1-1.1274-0.6098-1.1366 (0.2431) (0.3521) (0.4038) 0.25 1-0.7052-0.5828-1.2833 (0.2325) (0.4518) (0.4861) 0.5 1-0.4984-0.4497-1.0197 (0.3286) (0.6176) (0.6765) 0.75 1-0.3323-0.3961-0.8692 (0.3585) (0.6414) (0.7184) 0.9 1-0.6809-1.6961-5.2774 (0.0452) (0.0995) (0.0936) Table 2: Parametric vector estimates and standard errors (in parenthesis) for Boston housing data for Models (6) with five different quantile levels. E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 27 / 32

Numerical Studies Boston Housing Data τ WYY NWQR 0.1 1.102 1.093 0.25 2.105 2.085 0.5 2.845 2.786 0.75 2.577 2.482 0.9 1.749 1.709 Table 3: Comparison of average check absolute residuals R τ ( β 1 ) defined in (5). E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 28 / 32

Conclusions conditional quantiles provide a more complete picture of the conditional distribution and share nice properties such as robustness. E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 29 / 32

Conclusions conditional quantiles provide a more complete picture of the conditional distribution and share nice properties such as robustness. consider a single index model to reduce the dimensionality and maintain some nonparametric flexibility. iterative algorithms for estimating the parametric vector; convergence issues. proposed a method that directly estimates the parametric component non-iteratively and have derived its asymptotic distribution under heteroscedastic errors. E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 29 / 32

Conclusions conditional quantiles provide a more complete picture of the conditional distribution and share nice properties such as robustness. consider a single index model to reduce the dimensionality and maintain some nonparametric flexibility. iterative algorithms for estimating the parametric vector; convergence issues. proposed a method that directly estimates the parametric component non-iteratively and have derived its asymptotic distribution under heteroscedastic errors. make use of a parametrization that corresponds to the asymptotic theory derived. E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 29 / 32

Future Work for high dimensional data, it is possible to improve the local linear conditional quantile estimators Q τ (Y X i ). E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 30 / 32

Future Work for high dimensional data, it is possible to improve the local linear conditional quantile estimators Q τ (Y X i ). variable selection for estimating the local linear conditional quantile estimators. E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 30 / 32

Future Work for high dimensional data, it is possible to improve the local linear conditional quantile estimators Q τ (Y X i ). variable selection for estimating the local linear conditional quantile estimators. penalty term for estimating the parametric component β. E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 30 / 32

References I E. Guerre and C. Sabbah. Uniform bias study and Bahadure representation for local polynomial estimators of the conditional quantile function. Econometric Theory, 28(01):87 129, 2012. E. Kong and Y. Xia. A single-index quantile regression model and its estimation. Econometric Theory, 28:730 768, 2012. J. D. Opsomer and D. Ruppert. A fully automated bandwidth selection for additive regression model. Journal of the American Statistical Association, 93:605 618, 1998. T. Z. Wu, K. Yu and Y. Yu. Single index quantile regression. Journal of Multivariate Analysis, 101(7):1607 1621, 2010. E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 31 / 32

References II K. Yu and Z. Lu. Local linear additive quantile regression. Scandinavian Journal of Statistics, 31:333 346, 2004. E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015 32 / 32