Single Index Quantile Regression for Heteroscedastic Data

Size: px

Start display at page:

Download "Single Index Quantile Regression for Heteroscedastic Data"

Harvey Dickerson
5 years ago
Views:

1 Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

2 Outline 1 Introduction Motivation Possible Models 2 Single Index Quantile Regression Main Results 3 Numerical Studies 4 Variable Selection Motivation Main Results 5 Numerical Studies Example 1 Example 2 Real Data Example 6 Conclusions 7 Future Work E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

3 Outline 1 Introduction Motivation Possible Models 2 Single Index Quantile Regression Main Results 3 Numerical Studies 4 Variable Selection Motivation Main Results 5 Numerical Studies Example 1 Example 2 Real Data Example 6 Conclusions 7 Future Work E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

4 Introduction Motivation Least Squares Regression: Models the relationship between a d-dimensional vector of covariates X and the conditional mean of the response Y given X = x. Least Squares Estimator: 1 provides only a single summary measure for the conditional distribution of the response given the covariates. 2 sensitive to outliers and can provide a very poor estimator in many non-gaussian and especially long-tailed distributions. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

5 Introduction Motivation Quantile Regression (QR): Models the relationship between a d-dimensional vector of covariates X and the τ th conditional quantile of the response Y given X = x, Q τ (Y x). Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the covariates. 2 useful for modeling data with heterogeneous conditional distributions, especially data where extremes are important. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

6 Outline 1 Introduction Motivation Possible Models 2 Single Index Quantile Regression Main Results 3 Numerical Studies 4 Variable Selection Motivation Main Results 5 Numerical Studies Example 1 Example 2 Real Data Example 6 Conclusions 7 Future Work E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

7 Introduction Possible Models Linear QR: Q τ (Y x) = β x, for a d-dimensional vector of unknown parameters β. Disadvantage: linear assumption not always valid. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

8 Introduction Possible Models Linear QR: Q τ (Y x) = β x, for a d-dimensional vector of unknown parameters β. Disadvantage: linear assumption not always valid. Nonparametric QR: Q τ (Y x) = g(x), for an unspecified function g : R d R. Disadvantage: meets the data sparseness problem for high dimensional data. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

9 Introduction Possible Models Linear QR: Q τ (Y x) = β x, for a d-dimensional vector of unknown parameters β. Disadvantage: linear assumption not always valid. Nonparametric QR: Q τ (Y x) = g(x), for an unspecified function g : R d R. Disadvantage: meets the data sparseness problem for high dimensional data. Single-Index QR: Q τ (Y x) = g(β x β), depends on x only through the single linear combination β x. Advantage: reduces the dimension and maintain some nonparametric flexibility. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

10 Outline 1 Introduction Motivation Possible Models 2 Single Index Quantile Regression Main Results 3 Numerical Studies 4 Variable Selection Motivation Main Results 5 Numerical Studies Example 1 Example 2 Real Data Example 6 Conclusions 7 Future Work E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

11 The Model Let {Y i, X i } n i=1 be independent and identically distributed (iid) observations that satisfy Y i = Q τ (Y X i ) + ɛ i, where Q τ (Y x) = Q τ (Y β 1x) and the error term satisfies Q τ (ɛ i x) = 0. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

12 The Model Let {Y i, X i } n i=1 be independent and identically distributed (iid) observations that satisfy Y i = Q τ (Y X i ) + ɛ i, where Q τ (Y x) = Q τ (Y β 1x) and the error term satisfies Q τ (ɛ i x) = 0. For identifiability, we impose certain conditions on β 1 : the most common one: β 1 = 1 with its first coordinate positive. the one adopted here: β 1 = (1, β ), for β R d 1. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

13 The Model The true parametric vector β satisfies β = arg min b E ( ρ τ (Y Q τ (Y b 1X)) ), where b 1 = (1, b ) and ρ τ (u) = (τ I (u < 0))u. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

14 Iterative algorithms As in the single index mean regression (SIMR) problem, the unknown Q τ (Y b 1X) must be replaced with an estimator. Unlike the SIMR problem, however, there is no closed form expression for the estimator of Q τ (Y b 1 X i), and this has led to iterative algorithms for estimating β (Wu, Yu, and Yu, 2010; Kong and Xia, 2012). E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

15 The New Approach For any given b R d 1, define the function g(u b) : R R as where b 1 = (1, b ). g(u b) = E(Q τ (Y X) b 1X = u), E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

16 The New Approach For any given b R d 1, define the function g(u b) : R R as where b 1 = (1, b ). Noting that β also satisfies where b 1 = (1, b ). g(u b) = E(Q τ (Y X) b 1X = u), g(β 1X β) = Q τ (Y X), β = arg min b E ( ρ τ (Y g(b 1X b)) ), (1) E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

17 The New Approach The sample level version of (1) consists of minimizing S n (τ, b) = n ρ τ (Y i g(b 1X i b)). i=1 E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

18 The New Approach The sample level version of (1) consists of minimizing S n (τ, b) = n ρ τ (Y i g(b 1X i b)). i=1 Again, g( b) is unknown but it can be estimated, in a non-iterative fashion, by first obtaining estimators Q τ (Y X i ), for i = 1,..., n, and forming the Nadaraya-Watson-type estimator ĝ NW Q (t b) = n i=1 ( ) t b Q τ (Y X i )K 1 X i h ( ) n k=1 K t b, (2) 1 X k h where K( ) is a univariate kernel function and h is a bandwidth. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

19 The New Approach For the estimator Q τ (Y X i ), we use the local linear conditional quantile estimator (Guerre and Sabbah, 2012). Specifically, for a multivariate kernel function K (x) = K (x 1,..., x d ) and a univariate bandwidth h, let L n ((α 0, α 1 ); τ, x) = 1 nh d n i=1 ( ) ρ τ (Y i α 0 α 1(X i x))k Xi x h and define Q τ (Y x) as α 0 (τ; x), where α 0 (τ; x) is defined through ( α 0 (τ; x), α 1 (τ; x)) = arg min (α 0,α 1 ) L n((α 0, α 1 ); τ, x). E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

20 The proposed estimator is obtained by β = arg min b Θ n ( ) ρ τ Y i ĝ NW (b 1X i b), (3) i=1 where Θ R d 1 is a compact set, β is in the interior of Θ. Q E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

21 Outline 1 Introduction Motivation Possible Models 2 Single Index Quantile Regression Main Results 3 Numerical Studies 4 Variable Selection Motivation Main Results 5 Numerical Studies Example 1 Example 2 Real Data Example 6 Conclusions 7 Future Work E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

22 Main Results Proposition 1 Let ĝ NW (t b) be as defined in (2). Under some regularity conditions, we Q have that sup ĝ NW Q (t b) g(t b) ( = Op a n + a n + h2) 2, b Θ,t T b where T b = {t : t = b 1 x, x X 0}, a n = (log n/n) s/(2s+d) and a n = (log n/(nh 2 )) 1/2. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

23 Main Results Proposition 1 Let ĝ NW (t b) be as defined in (2). Under some regularity conditions, we Q have that sup ĝ NW Q (t b) g(t b) ( = Op a n + a n + h2) 2, b Θ,t T b where T b = {t : t = b 1 x, x X 0}, a n = (log n/n) s/(2s+d) and a n = (log n/(nh 2 )) 1/2. Proposition 2 Let β be as defined in (3). Under some regularity conditions, β is n-consistent estimator of β. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

24 Main Results Theorem Let β be as defined in (3). Under the assumptions of Proposition 2, n( β β) d N ( 0, τ(1 τ)v 1 ΣV 1), where V = E ( (g (β 1X β)) 2 (X 1 E(X 1 β 1X))(X 1 E(X 1 β 1X)) f ɛ X (0 X) ), and Σ = E ( (g (β 1X β)) 2 (X 1 E(X 1 β 1X))(X 1 E(X 1 β 1X)) ), for g (t b) = ( / t)g(t b) and X 1 the (d 1) dimensional vector after removing the first coordinate. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

25 Numerical Studies Notation NWQR - the proposed estimator. WYY - the estimator resulting from the iterative algorithm proposed by Wu et al. (2010). WYY-1 - the estimator resulting from dividing the estimator of Wu et al. (2010) by its first component. WYY-2 - the estimator resulting from using the proposed constraint in conjunction with the iterative algorithm of Wu et al. (2010). E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

26 Numerical Studies Example 1 Consider the model Y = exp(β 1X) + ɛ, (4) where X = (X 1, X 2 ), X i N(0, 1) are iid, β 1 = (1, 2) and the residual ɛ follows a standard normal distribution. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

27 Numerical Studies Example 1 Consider the model Y = exp(β 1X) + ɛ, (4) where X = (X 1, X 2 ), X i N(0, 1) are iid, β 1 = (1, 2) and the residual ɛ follows a standard normal distribution. We fit the single index quantile regression model for five different quantile levels, τ = 0.1, 0.25, 0.5, 0.75, 0.9. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

28 Numerical Studies Example 1 Consider the model Y = exp(β 1X) + ɛ, (4) where X = (X 1, X 2 ), X i N(0, 1) are iid, β 1 = (1, 2) and the residual ɛ follows a standard normal distribution. We fit the single index quantile regression model for five different quantile levels, τ = 0.1, 0.25, 0.5, 0.75, 0.9. We use sample size of n = 400 and perform N = 100 replications. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

29 Numerical Studies Example 1 For comparison of the above estimators, consider the mean square error, R( β), and the mean check based absolute residuals, R τ ( Q LL ), which are defined as R( β) = 1 N N ( β i 2) 2, R τ ( Q LL ) = 1 n i=1 n ρ τ (Y i i=1 Q LL τ (Y β 1X i )), where Q LL τ (Y β 1x) is the local linear conditional quantile estimator on the univariate variable β 1X i. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

30 Table 1: mean values and standard errors (in parenthesis), R( β) and R τ ( Q LL ) for Model (4). 95% coverage probability for NWQR, WYY-2 and the second estimated coefficient of Wu et al. (2010). E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63 Numerical Studies Example 1 τ NWQR (0.0603) (0.0477) (0.0481) (0.0683) (0.3905) β WYY-1 (0.2521) (0.2252) (3.8756) (0.3246) (0.5906) WYY (0.1593) (0.1307) (0.3937) (0.6735) (0.7240) NWQR R( β) WYY WYY NWQR R τ ( β 1 ) WYY WYY coverage NWQR prob. WYY WYY

31 Outline 1 Introduction Motivation Possible Models 2 Single Index Quantile Regression Main Results 3 Numerical Studies 4 Variable Selection Motivation Main Results 5 Numerical Studies Example 1 Example 2 Real Data Example 6 Conclusions 7 Future Work E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

32 Variable Selection Motivation Ubiquity of high dimensional data. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

33 Variable Selection Motivation Ubiquity of high dimensional data. Including unnecessary variables Vs Variable selection. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

34 Variable Selection Motivation Ubiquity of high dimensional data. Including unnecessary variables Vs Variable selection. Variable selection is also important in Quantile regression. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

35 Variable Selection Motivation Ubiquity of high dimensional data. Including unnecessary variables Vs Variable selection. Variable selection is also important in Quantile regression. Large body of wok. For example, Wang, Li, and Jiang (2007), Wu and Liu (2009) for linear QR, Koenker (2011) for nonparametric/ semiparametric QR, Alkenani and Yu (2013) for SIQR model. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

36 Outline 1 Introduction Motivation Possible Models 2 Single Index Quantile Regression Main Results 3 Numerical Studies 4 Variable Selection Motivation Main Results 5 Numerical Studies Example 1 Example 2 Real Data Example 6 Conclusions 7 Future Work E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

37 The Model Let {Y i, X i } n i=1 be independent and identically distributed (iid) observations that satisfy Y i = Q τ (Y X i ) + ɛ i, where Q τ (Y x) = Q τ (Y β 1x) and the error term satisfies Q τ (ɛ i x) = 0. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

38 The Model Let {Y i, X i } n i=1 be independent and identically distributed (iid) observations that satisfy Y i = Q τ (Y X i ) + ɛ i, where Q τ (Y x) = Q τ (Y β 1x) and the error term satisfies Q τ (ɛ i x) = 0. Sparsity Assumption: X i = (X i1, X i2 ), X i1 R d, X i2 R d d, d d is an integer that specifies the d relevant variables. β 1 = (1, β ) = (1, β 11, β 12) = (1, β 11, 0 ), where β 11 R d 1 is the non-zero parametric vector and β 12 R d d is the zero vector corresponding to the irrelevant variables. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

39 Iterations The true parametric vector β satisfies β = arg min b E ( ρ τ (Y Q τ (Y b 1X)) ), where b 1 = (1, b ) and ρ τ (u) = (τ I (u < 0))u. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

40 Iterations The true parametric vector β satisfies β = arg min b E ( ρ τ (Y Q τ (Y b 1X)) ), where b 1 = (1, b ) and ρ τ (u) = (τ I (u < 0))u. Problem Same problem arises! Existing algorithms are iterative; computationally expensive and convergence issues. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

41 New Approach The proposed estimator is obtained by β = arg min b Θ n ( ) ρ τ Y i ĝ NW (b 1X i b), i=1 where ĝ NW is the Nadaraya-Watson-type estimator, Θ R Q d 1 is a compact set, and β is in the interior of Θ. Q E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

42 New Approach The proposed estimator is obtained by β = arg min b Θ n ( ) ρ τ Y i ĝ NW (b 1X i b), i=1 where ĝ NW is the Nadaraya-Watson-type estimator, Θ R Q d 1 is a compact set, and β is in the interior of Θ. Recall, ĝ NW Q (t b) = n i=1 Q ( ) t b Q τ (Y X i )K 1 X i h ( ) n k=1 K t b, 1 X k h where Q τ (Y X i ) is the local linear conditional quantile estimator. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

43 New Approach Goals: For high dimensional data, it is possible to improve the estimation of the local linear conditional quantile estimators Q τ (Y X i ). Consider the use of penalty term for variable selection to estimate Q τ (Y x). E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

44 New Approach Goals: For high dimensional data, it is possible to improve the estimation of the local linear conditional quantile estimators Q τ (Y X i ). Consider the use of penalty term for variable selection to estimate Q τ (Y x). Consider the penalty term for producing the estimator of the parametric component β of the SIQR model. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

45 New Approach Steps: Step 1 can be thought of as an initial variable selection method used to achieve better estimation of the conditional quantiles Q τ (Y X i ). E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

46 New Approach Steps: Step 1 can be thought of as an initial variable selection method used to achieve better estimation of the conditional quantiles Q τ (Y X i ). Step 2 uses these improved estimators to perform simultaneous variable selection and estimation of the parametric component of the SIQR model. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

47 New Approach - Step 1 Specifically, for Step 1 we minimize with respect to b the objective function L n (τ, b) = n i=1 ρ τ (Y i ĝ NW Q (b 1X i b)) + n d p λ1 ( b j ), (5) j=2 where b 1 = (1, b ) = (1, b 2,..., b d ) and λ 1 is the tuning parameter for the SCAD penalty p λ1 ( ). E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

48 New Approach - Step 1 Denote with β VS the minimizer resulting from minimizing the objective function (5). This estimator can be used for variable selection. Let Ân = {j (2,..., d) : β VS j 0} of cardinality d 1. Construct improved estimators, QVS τ (Y x), of the conditional quantiles. In particular, for x the d -dimensional vector, define Q VS τ (Y x ) as the local linear conditional quantile estimator on the relevant variables. Q τ VS (Y x) has indeed a faster rate of convergence than Q τ (Y x). E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

49 - Step 2 For Step 2, we set ĝ NW VS (t b) = n i=1 ) τ (Y X i )K h ( ) n k=1 K t b 1 X k h Q VS and define the minimization problem ( t b 1 X i Ŝ n (τ, b) = n ( ) ρ τ Y i ĝ NW (b 1X i b) + n i=1 VS d p λ2 ( b j ), j=2 where b 1 = (1, b ) = (1, b 2,..., b d ) and λ 2 > 0 is the tuning parameter for the SCAD penalty p λ2 ( ). The proposed estimator is obtained by β SCAD = arg min Ŝ n (τ, b), (6) b Θ where Θ R d 1 is a compact set assumed to contain the true value of β. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

50 Comments Some Comments: SCAD penalty is motivated by the fact that directly applying a L 1 penalty tends to include inactive variables and to introduce bias. Also, SCAD penalty possess the oracle property. To be defined in a while. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

51 Comments Some Comments: SCAD penalty is motivated by the fact that directly applying a L 1 penalty tends to include inactive variables and to introduce bias. Also, SCAD penalty possess the oracle property. To be defined in a while. An alternative algorithm, where Step 1 was replaced by local variable selection for the computation of Q VS τ (Y x), was also used in some simulations and performed equally well in the settings tried. While such an algorithm may be more appropriate for very high dimensional data, it is time consuming to run. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

52 Outline 1 Introduction Motivation Possible Models 2 Single Index Quantile Regression Main Results 3 Numerical Studies 4 Variable Selection Motivation Main Results 5 Numerical Studies Example 1 Example 2 Real Data Example 6 Conclusions 7 Future Work E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

53 Main Results Theorem Model Selection Consistency: Let β VS = ( β 2 VS,..., β VS d ) be the minimizer of the objective function L n (τ, b) defined in (5). Define Ân the β VS j 0} and A the set {j {2,..., d} : β j 0}. set {j {2,..., d} : Under some regularity conditions and the conditions λ 1 0 and nλ1 as n, we have lim n P(Ân = A) = 1. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

54 Main Results Proposition 1 Let Q VS τ (Y x) denote the local linear conditional quantile estimator on the relevant variables. Then, under some regularity assumptions, sup x X 0 ( ) VS log n s/(2s+d ) Q τ (Y x) Q τ (Y x) = O p = O p (an ) n if the bandwidth h is asymptotically proportional to (log n/n) 1/(2s+d ). E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

55 Main Results Proposition 1 Let Q VS τ (Y x) denote the local linear conditional quantile estimator on the relevant variables. Then, under some regularity assumptions, sup x X 0 ( ) VS log n s/(2s+d ) Q τ (Y x) Q τ (Y x) = O p = O p (an ) n if the bandwidth h is asymptotically proportional to (log n/n) 1/(2s+d ). Proposition 2 Let β SCAD be as defined in (6). Then, under the assumptions of Proposition 1 and the assumption that λ 2 0 as n, β SCAD is n-consistent estimator of β. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

56 Main Results Theorem Oracle Property: Let β SCAD be as defined in (6) and set β SCAD = ( β SCAD SCAD 11, β 12 ) SCAD, where β 11 is of cardinality d 1. If the assumptions of Proposition 1 hold and moreover λ 2 0 and nλ 2 as n, then with probability tending to one, SCAD 1. Sparsity: β 12 = 0 2. Asymptotic Normality: n( βscad 11 β 11 ) d N(0, τ(1 τ)v 1 11 Σ 11V 1 11 ), where ( ) V 11 = E (g (β 1 X1 β))2 (X 1, 1 E(X 1, 1 β 1 X))(X1, 1 E(X1, 1 β 1 X)) f ɛ X (0 X) and Σ 11 = E ( (g (β 1X 1 β)) 2 (X 1, 1 E(X 1, 1 β 1X))(X 1, 1 E(X 1, 1 β 1X)) ), for g (t b) = ( / t)g(t b) and X 1, 1 the (d 1) dimensional vector. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

57 Numerical Studies NWQR denotes the unpenalized estimator. SCAD-NWQR denotes the proposed estimator. LASSO-AY denotes the LASSO estimator of Alkenani and Yu (2013). ALASSO-AY denotes the adaptive-lasso estimator of Alkenani and Yu (2013). E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

58 Numerical Studies How to evaluate the performance? size of the estimated parametric component (number of nonzero coefficients), number of its correct and incorrect zeros, absolute estimation error of the estimated parametric component, and mean squared error (MSE) of MSE( β SCAD 1 X) = 1 n n i=1 For the final conditional quantile estimator the mean check based absolute residuals LL R τ ( Q τ ) = 1 n ρ τ (Y i n i=1 β SCAD 1 X, defined as ( β SCAD 1 X i β 1X i ) 2. Q LL τ (Y β SCAD 1 X i ), we consider LL Q τ (Y β SCAD 1 X i )). E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

59 Outline 1 Introduction Motivation Possible Models 2 Single Index Quantile Regression Main Results 3 Numerical Studies 4 Variable Selection Motivation Main Results 5 Numerical Studies Example 1 Example 2 Real Data Example 6 Conclusions 7 Future Work E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

60 Numerical Studies Example 1 - Asymmetric Homoscedastic Errors Consider the model Y = 5 cos(β 1X) + exp( (β 1X) 2 ) + ɛ, (7) where X = (X 1,..., X 5 ), X i U(0, 1) are iid, β 1 = (1, 2, 0, 0, 0), the residual ɛ follows an exponential distribution with mean 2, and X i s and ɛ are mutually independent. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

61 Numerical Studies Example 1 - Asymmetric Homoscedastic Errors Consider the model Y = 5 cos(β 1X) + exp( (β 1X) 2 ) + ɛ, (7) where X = (X 1,..., X 5 ), X i U(0, 1) are iid, β 1 = (1, 2, 0, 0, 0), the residual ɛ follows an exponential distribution with mean 2, and X i s and ɛ are mutually independent. We fit the SIQR model for five different quantile levels, τ = 0.1, 0.25, 0.5, 0.75, 0.9. We use a sample size of n = 400 and perform N = 100 replications. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

62 Numerical Studies Example 1 τ size (0.4462) (0.4513) (0.4688) (0.5707) (0.6256) # corr. zeros (0.4462) (0.4513) (0.4688) (0.5707) (0.6256) # incor. zeros (0) (0) (0) (0) (0) R( β SCAD ) R τ ( Q LL τ ) Table 2: mean values and standard deviations (in parenthesis) for the size and the number of correct and incorrect zeros of the estimated parametric component β SCAD. Also, mean values for R( β SCAD ) and R τ ( Q LL τ ) for Model (7). E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

63 Numerical Studies Example 1 Comments: The proposed methodology selects correctly X 2 as the significant covariate 100% of the times for all quantile levels (0 average number of incorrect zeros), gives an average number of correct zeros close to the true value of 3. The average R( β SCAD ) of the estimated parametric component seems to increase with increasing quantile level; the asymptotic variance formula of the sample quantiles involves the inverse density of the error term evaluated at the specific quantile of interest. Overall performance is good. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

64 Numerical Studies Example 1 τ SCAD-NWQR (0.0041) (0.0034) (0.0047) (0.0059) (0.0997) LASSO-AY (0.0005) (0.0026) (0.0064) (0.0454) (0.0734) ALASSO-AY (0.0004) (0.0022) (0.0065) (0.0443) (0.0702) Table 3: mean values and standard deviations (in parenthesis) for the MSE( β 1X) for the SCAD-NWQR, LASSO-AY, and ALASSO-AY estimated parametric components for Model (7). E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

65 Numerical Studies Example 1 Comments: the proposed methodology outperforms the LASSO and adaptive-lasso estimators of Alkenani and Yu (2013) for all quantile levels, except for the lowest one. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

66 Outline 1 Introduction Motivation Possible Models 2 Single Index Quantile Regression Main Results 3 Numerical Studies 4 Variable Selection Motivation Main Results 5 Numerical Studies Example 1 Example 2 Real Data Example 6 Conclusions 7 Future Work E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

67 Numerical Studies Example 2 - Heteroscedastic Errors Consider the model Y = sin(2πβ 1X) + (1 + β 2X) 2 ɛ, (8) 4 where X = (X 1,..., X 5 ), X i U(0, 1) are iid, β 1 = (1, 2, 0, 0, 0), β 2 = (1, 1, 1, 0, 0), the residual ɛ follows an exponential distribution with mean 1. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

68 Numerical Studies Example 2 - Heteroscedastic Errors Consider the model Y = sin(2πβ 1X) + (1 + β 2X) 2 ɛ, (8) 4 where X = (X 1,..., X 5 ), X i U(0, 1) are iid, β 1 = (1, 2, 0, 0, 0), β 2 = (1, 1, 1, 0, 0), the residual ɛ follows an exponential distribution with mean 1. We fit the SIQR model for five different quantile levels, τ = 0.1, 0.25, 0.5, 0.75, 0.9. We use a sample size of n = 400 and perform N = 100 replications. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

69 Numerical Studies Example 2 - Heteroscedastic Errors Consider the model Y = sin(2πβ 1X) + (1 + β 2X) 2 ɛ, (8) 4 where X = (X 1,..., X 5 ), X i U(0, 1) are iid, β 1 = (1, 2, 0, 0, 0), β 2 = (1, 1, 1, 0, 0), the residual ɛ follows an exponential distribution with mean 1. We fit the SIQR model for five different quantile levels, τ = 0.1, 0.25, 0.5, 0.75, 0.9. We use a sample size of n = 400 and perform N = 100 replications. We compare the penalized estimated parametric component (SCAD-NWQR) with the unpenalized one (NWQR). E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

70 Numerical Studies Example 2 τ size (0.6485) (0.3835) (0.3075) (0.4795) (0.5751) # corr. zeros (0.4777) (0.3667) (0.3075) (0.4795) (0.5751) # incor. zeros (0.3589) (0.1) (0) (0) (0) Table 4: mean values and standard deviations (in parenthesis) for the size and the number of correct and incorrect zeros of the estimator parametric component β SCAD for Model (8). E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

71 Numerical Studies Example 2 τ NWQR MSE( β (0.9675) (0.5094) (0.2012) (0.2992) (4.0261) 1X) SCAD-NWQR (0.4606) (0.2201) (0.1486) (0.0461) (0.1468) R( β SCAD NWQR ) SCAD-NWQR R τ ( Q LL NWQR τ ) SCAD-NWQR Table 5: mean values and standard deviations (in parenthesis) for the MSE( β 1X), average R( β) and average R τ ( Q LL τ ) for NWQR and SCAD-NWQR estimated parametric components for Model (8). E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

72 Outline 1 Introduction Motivation Possible Models 2 Single Index Quantile Regression Main Results 3 Numerical Studies 4 Variable Selection Motivation Main Results 5 Numerical Studies Example 1 Example 2 Real Data Example 6 Conclusions 7 Future Work E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

73 Numerical Studies Boston Housing Data 506 observations. 14 variables, response: medv, the median value of owner-occupied homes in $1000 s, predictors: measurements on the 506 census tracts in suburban Boston from the 1970 census. available in the MASS library in R. Collinearity in the data set; Breiman and Friedman (1985) applied ACE method and selected the four covariates RM, TAX, PTRATIO, and LSTAT. Variable selection in QR: Wu and Liu (2009) considered SCAD and adaptive-lasso penalties, under the linear QR model. Alkenani and Yu (2013) considered the LASSO and adaptive-lasso penalties, under the SIQR model. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

74 Numerical Studies Boston Housing Data exclude CHAS and RAD, standardize the response and the remaining 11 predictor variables, denote the predictor variables by X 1, X 2,...,X 11, and consider the SIQR model: Q τ (medv X) = g(x 1 + β 2 X 2 + β 3 X β 11 X 11 β), (9) for τ = 0.1, 0.25, 0.5, 0.75, 0.9, which assumes that RM is a significant predictor. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

75 Numerical Studies Boston Housing Data τ RM CRIM ZN INDUS NOX AGE DIS TAX PTRATIO BLACK LSTAT e e e Table 6: Parameter estimates for the Boston housing data for fitting the SIQR model (9) to each quantile level. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

76 Numerical Studies Boston Housing Data τ SCAD-NWQR no. of zeros LASSO-AY ALASSO-AY τ SCAD-NWQR LASSO-AY ALASSO-AY Tables 7 and 8: Number of zero coefficients and mean check based absolute residuals, R τ ( Q LL τ ), for SCAD-NWQR, LASSO-AY, and ALASSO-AY for fitting the SIQR model (9) to each quantile level. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

77 Numerical Studies Boston Housing Data Comments: different significant predictor variables, the proposed methodology gives more sparse models thatn the LASSO and adaptive-lasso methodologies of Alkenani and Yu (2013), the mean check based absolute residuals resulting from the proposed SCAD-NWQR is slightly larger than that of the other methods for all except the 90th quantile. The difference is probably due to the different penalty functions used by the different methods, but could also be explained by the different levels of sparsity attained. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

78 Numerical Studies Boston Housing Data medv medv index index medv medv index index 0.9 medv index Figure: Estimated single index quantile regression for Boston housing data. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

79 Conclusions conditional quantiles provide a more complete picture of the conditional distribution and share nice properties such as robustness. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

80 Conclusions conditional quantiles provide a more complete picture of the conditional distribution and share nice properties such as robustness. consider a single index model to reduce the dimensionality and maintain some nonparametric flexibility. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

81 Conclusions conditional quantiles provide a more complete picture of the conditional distribution and share nice properties such as robustness. consider a single index model to reduce the dimensionality and maintain some nonparametric flexibility. iterative algorithms for estimating the parametric vector; convergence issues. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

82 Conclusions conditional quantiles provide a more complete picture of the conditional distribution and share nice properties such as robustness. consider a single index model to reduce the dimensionality and maintain some nonparametric flexibility. iterative algorithms for estimating the parametric vector; convergence issues. proposed a method that directly estimates the parametric component non-iteratively and have derived its asymptotic distribution under heteroscedastic errors. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

83 Conclusions conditional quantiles provide a more complete picture of the conditional distribution and share nice properties such as robustness. consider a single index model to reduce the dimensionality and maintain some nonparametric flexibility. iterative algorithms for estimating the parametric vector; convergence issues. proposed a method that directly estimates the parametric component non-iteratively and have derived its asymptotic distribution under heteroscedastic errors. for high-dimensional data, introduce penalty terms both for variable selection to estimate Q τ (Y x) and for producing the estimator of the parametric component of the SIQR model. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

84 Future Work extend the proposed methodology for censored data. The proposed estimator is obtained by β = arg min Ŝn CD (τ, b), b Θ where Ŝ CD n (τ, b) = n i=1 ( ) i Ĝ n (Z i X i ) ρ τ Z i ĝcd NW (b 1X i b). Christou and Akritas (2015); working paper. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

85 References I A. Alkenani and K. Yu Penalized single-index quantile regression. International Journal of Statistics and Probability, 2: 12 30, L. Breiman and J.H. Friedman Estimating Optimal Transformations for multiple regression and correlation. Journal of the American Statistical Association, 80: , E. Christou and M.G. Akritas Single Index Quantile Regression for Censored Data. Working paper, E. Guerre and C. Sabbah. Uniform bias study and Bahadure representation for local polynomial estimators of the conditional quantile function. Econometric Theory, 28(01):87 129, E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

86 References II R. Koenker. Additive Models for Quantile Regression: Model Selection and Confidence Bandaids. Brazilian Journal of Probability and Statistics, 25: , E. Kong and Y. Xia. A single-index quantile regression model and its estimation. Econometric Theory, 28: , J. D. Opsomer and D. Ruppert. A fully automated bandwidth selection for additive regression model. Journal of the American Statistical Association, 93: , H. Wang, G. Li and G. Jian Robust Regression Shrinkage and Consistent Variable Selection through the LAD-Lasso. Journal of Business and Economic Statistics 25: , E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

87 References III Y. Wu and Y. Liu Variable Selection in Quantile Regression. Statistica Sinica 19: , T. Z. Wu, K. Yu and Y. Yu. Single index quantile regression. Journal of Multivariate Analysis, 101(7): , K. Yu and Z. Lu. Local linear additive quantile regression. Scandinavian Journal of Statistics, 31: , E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

88 Thank you! Many thanks to my advisor, Prof. Michael Akritas. E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, / 63

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015