Quantile Regression for Extraordinarily Large Data

Similar documents
Quantile Processes for Semi and Nonparametric Regression

Semi-Nonparametric Inferences for Massive Data

Inference on distributions and quantiles using a finite-sample Dirichlet process

Sample Size Requirement For Some Low-Dimensional Estimation Problems

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Single Index Quantile Regression for Heteroscedastic Data

Asymptotic Statistics-III. Changliang Zou

Bayesian Aggregation for Extraordinarily Large Dataset

Nonparametric Inference via Bootstrapping the Debiased Estimator

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Inference For High Dimensional M-estimates. Fixed Design Results

Nonparametric Inference In Functional Data

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Can we do statistical inference in a non-asymptotic way? 1

Single Index Quantile Regression for Heteroscedastic Data

Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss;

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES. Liping Zhu, Mian Huang, & Runze Li. The Pennsylvania State University

Estimation of cumulative distribution function with spline functions

Estimation of the functional Weibull-tail coefficient

Bayesian spatial quantile regression

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

Asymmetric least squares estimation and testing

Inference For High Dimensional M-estimates: Fixed Design Results

41903: Introduction to Nonparametrics

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

Cross-Validation with Confidence

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Financial Econometrics / 49

Bayesian Modeling of Conditional Distributions

Model Selection and Geometry

A Resampling Method on Pivotal Estimating Functions

Modeling Real Estate Data using Quantile Regression

Uncertainty Quantification for Inverse Problems. November 7, 2011

A Divide-and-Conquer Bayesian Approach to Large-Scale Kriging

Nonparametric Heterogeneity Testing for Massive Data

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

Beyond Mean Regression

STAT 461/561- Assignments, Year 2015

Statistical inference on Lévy processes

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Gaussian Process Regression

STAT 200C: High-dimensional Statistics

Estimation de mesures de risques à partir des L p -quantiles

Expecting the Unexpected: Uniform Quantile Regression Bands with an application to Investor Sentiments

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Comprehensive Examination Quantitative Methods Spring, 2018

A short introduction to INLA and R-INLA

A Conditional Approach to Modeling Multivariate Extremes

Nonparametric estimation of extreme risks from heavy-tailed distributions

Divide-and-combine Strategies in Statistical Modeling for Massive Data

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

For right censored data with Y i = T i C i and censoring indicator, δ i = I(T i < C i ), arising from such a parametric model we have the likelihood,

Lecture 7 Introduction to Statistical Decision Theory

STAT 518 Intro Student Presentation

Independent Component (IC) Models: New Extensions of the Multinormal Model

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Nonparametric regression with martingale increment errors

Additive Isotonic Regression

Cross-Validation with Confidence

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

Stat 710: Mathematical Statistics Lecture 31

Sliced Inverse Regression

arxiv: v1 [cs.it] 21 Feb 2013

ESTIMATION AND OUTPUT ANALYSIS (L&K Chapters 9, 10) Review performance measures (means, probabilities and quantiles).

On detection of unit roots generalizing the classic Dickey-Fuller approach

sparse and low-rank tensor recovery Cubic-Sketching

Solving Corrupted Quadratic Equations, Provably

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

Sensitivity to Missing Data Assumptions: Theory and An Evaluation of the U.S. Wage Structure. September 21, 2012

Semi and Nonparametric Models in Econometrics

Multivariate Bayesian Linear Regression MLAI Lecture 11

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

On Model Selection Consistency of Lasso

Program Evaluation with High-Dimensional Data

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

19.1 Problem setup: Sparse linear regression

Smooth nonparametric estimation of a quantile function under right censoring using beta kernels

Nonparametric Conditional Density Estimation Using Piecewise-Linear Solution Path of Kernel Quantile Regression

Chapter 4: Interpolation and Approximation. October 28, 2005

Covariance function estimation in Gaussian process regression

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

ECAS Summer Course. Fundamentals of Quantile Regression Roger Koenker University of Illinois at Urbana-Champaign

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Convergence Rates of Kernel Quadrature Rules

Concentration behavior of the penalized least squares estimator

Sparse Legendre expansions via l 1 minimization

Negative Association, Ordering and Convergence of Resampling Methods

Multivariate quantiles and conditional depth

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Linear Models for Regression CS534

Regression: Lecture 2

Why experimenters should not randomize, and what they should do instead

Ultra High Dimensional Variable Selection with Endogenous Variables

Chapter 2 Interpolation

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

Transcription:

Quantile Regression for Extraordinarily Large Data Shih-Kang Chao Department of Statistics Purdue University November, 2016 A joint work with Stanislav Volgushev and Guang Cheng

Quantile regression Two-step algorithm Oracle rules Simulation References With the advance of technology, it is increasingly common that data set is so large that it cannot be stored in a single machine Social media (views, likes, comments, images...) Meteorological and environmental surveillance Transactions in e-commerce Others... Figure: A server room in Council Bluffs, Iowa. Photo: Google/Connie Zhou.

Divide and Conquer (D&C) To take advantage of the opportunities in massive data, we need to deal with storing (disc) and computational (memory, processor) bottlenecks. Divide and conquer paradigm: Randomly divide N samples into S groups. n = N/S Problem(N) subproblem (n) subproblem (n) subproblem (n) subproblem (n)

Aggregate solutions from subproblems to get a solution for original problem Can be implemented by computational platforms such as Hadoop (White, 2012) Data that are stored in distributed locations can be analyzed in similar manner Problem(N) subproblem (n) subproblem (n) subproblem (n) subproblem (n)

Does D&C fit for statistical analysis? Sometimes it does, but sometimes it doesn t... In the following, we give two simple examples.

Example 1: sample mean Problem(N) subproblem (n) subproblem (n) subproblem (n) subproblem (n) X n,1 X n,2 X n,3 X n,4 1 4 4 s=1 It fits! X s = 1 4n 4 s=1 i=1 n X is = 1 N N X i = X N. i=1

Example 2: sample median Problem(N) subproblem (n) subproblem (n) X 1 (0.5) X 2 (0.5) subproblem (n) X 3 (0.5) subproblem (n) X 4 (0.5) X(0.5) s = the middle value of ordered n samples in s group; X (0.5) = overall median 1 4 4 s=1 X s (0.5)?? = X (0.5)

Example 2: sample median Simulation 1: X i N(0, 1); Simulation 2: X i Exponential(1). N = 2 15. True median v.s. simulated S 1 S s=1 Xs (0.5) 0.20 0.10 0.00 0.10 τ = 0.5, N (0,1) 0.2 0.4 0.6 0.8 log N(S) τ = 0.5, EXP (1) 0.7 0.8 0.9 1.0 0.2 0.4 0.6 0.8 log N(S)

Example 2: sample median Simulation 1: X i N(0, 1); Simulation 2: X i Exponential(1). N = 2 15. True median v.s. simulated S 1 S s=1 Xs (0.5) 0.20 0.10 0.00 0.10 τ = 0.5, N (0,1) 0.2 0.4 0.6 0.8 log N(S) τ = 0.5, EXP (1) 0.7 0.8 0.9 1.0 0.2 0.4 0.6 0.8 log N(S)

Major issues When does the D&C algorithm work? Especially for skewed and heavy tail distribution Statistical inference Asymptotic distribution Inference for the whole distribution? Take advantage of massive size to discover subtle patterns hidden in the whole distribution

Outline 1 Quantile regression 2 Two-step algorithm: D&C and quantile projection 3 Oracle rules: linear model and nonparametric model 4 Simulation

Quantile Response Y, predictors X. For τ (0, 1), conditional quantile curve Q( ; τ) of Y R conditional on X is defined through P (Y Q(X; τ) X = x) = τ x. Q(x; τ) at τ = 0.1, 0.25, 0.5, 0.75, 0.9 under different models 2 1 0 1 2 3 Y = 0.5*X + N(0,1) x y 2 0 2 4 6 Y = X + (.5+X)*N(0,1) x y Y X=x ~ Beta(1.5 x,.5+x) x y

Quantile regression v.s. mean regression Mean regression: Y i = m(x i ) + ε i, E[ε X = x] = 0 m: Regression function, object of interest. ε i : errors. Quantile regression: P (Y Q(x; τ) X = x) = τ No strict distinction between signal and noise. Object of interest: properties of conditional distribution of Y X = x. Contains much richer information than just conditional mean. 10 15 20 25 0.05 0.00 0.05 0.10 0.15 0.20 Age Bone Mass Density (BMD) 10 15 20 25 0.05 0.00 0.05 0.10 0.15 0.20 Age Bone Mass Density (BMD)

Quantile curves v.s. conditional distribution Q(x 0 ; τ) = F 1 (τ x 0 ), where F Y X (y x) is the conditional distribution function of Y given X BMD 0.00 0.05 0.10 Q^( Age x 0 = 13 ; τ) Q^( Age x 0 = 18 ; τ) Q^( Age x 0 = 23 ; τ) CDF F^(y Age x 0 = 13) F^(y Age x 0 = 18) F^(y Age x 0 = 23) τ 0.00 0.05 0.10 BMD

Quantile Regression as Optimization {(X i, Y i )} N i=1 independent and identical samples in Rd+1 Koenker and Bassett (1978): if Q(x; τ) = β(τ) x, estimate by β or (τ) := arg min b N ρ τ (Y i b X i ) (1.1) i=1 where ρ τ (u) := τu + + (1 τ)u check function. Optimization problem (1.2) is convex (but non-smooth), which can be solved by linear programming Q(x 0 ; τ) := x 0 β or (τ) for any x 0 More generally, we can consider a series approximation model

A unified framework: Q(x; τ) Z(x) β(τ) m := dim(z) (it is possible that m ). Solve β or (τ) := arg min b N { ρ τ Yi b Z(X i ) } (1.2) i=1 Examples of Z(x): linear model with fixed/increasing dimension, B-splines, polynomials, trigonometric polynomials Q(x; τ) := Z(x) βor (τ) Need to control the bias Q(x; τ) Z(x) β(τ) Quantile Regression

Summary for quantile regression β or (τ) is computationally infeasible when N is so large that cannot be handled with a single machine To infer the whole conditional distribution for fixed x 0, by F Y X (y x 0 ) = 1 0 1{Q(x 0 ; τ) < y}dτ where 1(A) = 1 if A is true. To approximate the integral, we need a to compute Q(x 0 ; τ) for many τ

D&C algorithm at fixed τ Problem(N) subproblem (n) subproblem (n) β 1 (τ 1 ) β 2 (τ 1 ) subproblem (n) β 3 (τ 1 ) subproblem (n) β 4 (τ 1 ) β s (τ) := arg min b R m n { ρ τ Yis b Z(X is ) } (2.1) i=1 β(τ) := 1 S S β s (τ). (2.2) s=1 However, this is only for a fixed τ!

Quantile projection algorithm To avoid repetitively applying D&C, take B := (B 1,..., B q ) B-spline basis defined on equidistant knots {t 1,..., t G } T with degree r τ N, β(τ) := Ξ B(τ). (2.3) Computation of Ξ: 1 Define a grid of quantile levels {τ 1,..., τ K } on [τ L, τ U ], K > q. For each τ k, compute β(τ k ) 2 Compute for j = 1,..., m α j := arg min α R q K ( βj (τ k ) α B(τ k ) ) 2. (2.4) k=1 3 Set the matrix Ξ := [ α 1 α 2... α m ].

Computation of F Y X (y x) Define Q(x; τ) := Z(x) β(τ) = Z(x) Ξ B(τ). Compute τu F Y X (y x 0 ) := τ L + 1{ Q(x 0 ; τ) < y}dτ. (2.5) τ L where 0 < τ L < τ U < 1.

Remarks Both S and K can grow with N, and they decide the computational limit as well as the statistical performance Find S and K such that β(τ) and β(τ) are close to β or (τ) in some statistical sense This results in sharp upper bound of S and lower bound of K which impose computational limit The two-step procedure requires only one pass through the entire data Quantile projection requires only {β(τ 1 ),..., β(τ K )} of size m K, without the need to access the raw data set

Oracle rules For T = [τ L, τ U ] (0, 1), under regularity conditions, C., Volgushev and Cheng (2016): Chao et al. (2016) a N u ( βor (τ) β(τ) ) N at any fixed τ T (3.1) a N u ( βor (τ) β(τ) ) G(τ) as a process in τ T (3.2) where N is a centered normal distribution and G is a centered Gaussian process The oracle rule holds for β(τ) and β(τ) if β(τ) satisfies (3.1) and β(τ) satisfies (3.2) Inference follows from the oracle rule

Two leading models Linear model: m = dim(z(x)) is fixed, and Q(x; τ) = Z(x) β(τ); Univariate spline nonparametric model: m = dim(z(x)) with N. c N (γ N ) := Q(x; τ) Z(x) γ N (τ) 0, γ N (τ) := arg min γ R m E[ (Z γ Q(X; τ)) 2 f(q(x; τ) X) ]. (3.3) Oracle rule: spline model γ N can be defined in many other ways. We do not go into detail here Conditions imposed throughout the talk: J m (τ) := E[ZZ f Y X (Q(X; τ) X)] Assumption (A)

Linear model with fixed dimension P 1 (ξ m, M, f, f, f min ): all distributions satisfying (A1)-(A3) with some constants 0 < ξ m, M, f, f < and f min > 0 Theorem 3.1 (Oracle Rule for β(τ)) Suppose that S = o(n 1/2 (log N) 2 ). τ T and u R m, Nu (β(τ) β(τ)) (u J m (τ) 1 E[Z(X)Z(X) ]J m (τ) 1 u) 1/2 N ( 0, τ(1 τ) ), If S N 1/2, then the weak convergence result above fails for some P P 1 (1, d, f, f, f min ). S = o(n 1/2 (log N) 2 ) is sharp: only miss by a factor of (log N) 2

Λ ητ c (T ): Hölder class with smoothness η τ > 0. l (T ): set of all uniformly bounded real functions on T Theorem 3.2 (Oracle Rule for β( )) Suppose that τ Q(x 0 ; τ) Λ ητ c (T ) for a x 0 X. If S = o(n 1/2 (log N) 2 ), and K G N 1/(2ητ ) with r τ η τ then the projection estimator β(τ) defined in (2.3) satisfies N ( Z(x0 ) β( ) Q(x0 ; ) ) G( ) in l (T ), (3.4) where G is a centered Gaussian process. Covariance function If G N 1/(2ητ ) the weak convergence in (3.4) fails for some P P 1 (1, d, f, f, f min ) with τ Q(x 0 ; τ) Λ ητ c (T ). K N 1/(2ητ ) is sharp!

Splines nonparametric model We require an additional condition: (L) For each x X, the vector Z(x) has zeroes in all but at most r consecutive entries, where r is fixed. Furthermore, sup x X Ẽ(Z(x), γ N ) = O(1). which guarantees that the matrix J m (τ) to be a block matrix P L (Z, M, f, f, f min, R) := { all sequences of distributions of (X, Y ) on R d+1 satisfying (A1)-(A3) with M, f, f <, f min > 0, and (L) for some r < R; m 2 (log N) 6 = o(n), c N (γ N ) 2 = o(n 1/2 ) }. (3.5) Definition c N

Splines nonparametric model Theorem 3.3 (Oracle rule for β(τ)) Let {(X i, Y i )} N i=1 be distributed P N P L (Z, M, f, f, f min, R), m N ζ for some ζ > 0. S = o((nm 1 (log N) 4 ) 1/2 ) then τ T, x 0 X, NZ(x0 ) (β(τ) γ N (τ)) (Z(x 0 ) J m (τ) 1 E[ZZ ]J m (τ) 1 Z(x 0 )) 1/2 N ( 0, τ(1 τ) ), If S (N/m) 1/2 the weak convergence above fails for some P N P L (Z, M, f, f, f min, R) Sharpness of S = o((n/m) 1/2 (log N) 2 ): nonparametric rate is slower, again miss by a factor of (log N) 2

Theorem 3.4 (Oracle rule for β(τ)) Let the assumptions of Theorem 3.3 hold and assume that τ Q(x 0 ; τ) Λ ητ c (T ), r τ η τ, Suppose that c N (γ N ) = o(n 1/2 Z(x 0 ) ) and K G N 1/(2ητ ) Z(x 0 ) 1/ητ, and the limit H(τ 1, τ 2 ) := lim N K N (τ 1, τ 2 ) > 0 K N τ 1, τ 2 T. N ( Z(x0 ) β( ) Q(x0 ; ) ) G( ) in l (T ), Z(x 0 ) where G is a centered Gaussian process with covariance structure E[G(τ)G(τ )] = H(τ, τ ). If G N 1/(2ητ ) Z(x 0 ) 1/ητ the weak convergence fails for some τ Q(x 0 ; τ) Λ ητ c (T ) for all S.

Distribution function τu F Y or X ( x 0) := τ L + 1{Z(x) βor (τ) < y}dτ τ L The oracle rule holds for both models. Corollary 3.5 Under the same conditions as Theorem 3.2 (linear model) or Theorem 3.4 (nonparametric spline model), we have for any x 0 X, N ( FY X ( x 0 ) F Y X ( x 0 ) ) f Y X ( x 0 )G ( F Y X ( x 0 ) ), in l ( (Q(x 0 ; τ L ), Q(x 0 ; τ U )) ), where G is a centered Gaussian process defined in respective theorem. The same holds for F or Y X ( x 0).

Phase transitions K?? N 1/(2ητ ) ( N m) 1/(2ητ ) ( N 1/2 1/2 N m 1/2 log 2 N m) N 1/2 log 2 N N 1/2 S Figure: Regions (S, K) for the oracle rule of linear model and spline nonparametric model.? region is the discrepancy between the sufficient and necessary conditions.

Setting: linear model Model: Y i = 0.21 + β m 1 X i + ε i, m = 4, 16, 32 X i follows a multivariate uniform distribution U([0, 1] m 1 ) with covariance matrix Σ X := E[X i X i ] with Σ jk = 0.1 2 0.7 j k for j, k = 1,..., m 1 ε N or ε EXP (skewed) x 0 = (1, (m 1) 1/2 l m 1 ) From Theorem 3.1, the coverage probability of the α = 95% confidence interval is P { x 0 β(τ) [ x 0 β(τ) ± N 1/2 fε,τ 1 τ(1 τ)x 0 Σ 1 X x 0Φ 1 (1 α/2) ]} S : the point of S where the coverage starts to drop Additional information

β(τ), ε N (0, 0.1 2 ) Normal(0, 0.1 2 ), τ = 0.1 Normal(0, 0.1 2 ), τ = 0.5 Coverage Probability N = 2 14, m = 4 N = 2 22, m = 4 N = 2 14, m = 16 N = 2 22, m = 16 N = 2 14, m = 32 N = 2 22, m = 32 Coverage Probability N = 2 14, m = 4 N = 2 22, m = 4 N = 2 14, m = 16 N = 2 22, m = 16 N = 2 14, m = 32 N = 2 22, m = 32 0.0 0.2 0.4 0.6 0.8 log N(S) 0.0 0.2 0.4 0.6 0.8 log N(S) Normal(0, 0.1 2 ), τ = 0.9 Coverage Probability N = 2 14, m = 4 N = 2 22, m = 4 N = 2 14, m = 16 N = 2 22, m = 16 N = 2 14, m = 32 N = 2 22, m = 32 0.0 0.2 0.4 0.6 0.8 log N(S)

β(τ), ε Exp(0.8) Exp (0.8), τ = 0.1 Exp (0.8), τ = 0.5 Coverage Probability N = 2 14, m = 4 N = 2 22, m = 4 N = 2 14, m = 16 N = 2 22, m = 16 N = 2 14, m = 32 N = 2 22, m = 32 Coverage Probability N = 2 14, m = 4 N = 2 22, m = 4 N = 2 14, m = 16 N = 2 22, m = 16 N = 2 14, m = 32 N = 2 22, m = 32 0.0 0.2 0.4 0.6 0.8 log N(S) 0.0 0.2 0.4 0.6 0.8 log N(S) Exp (0.8), τ = 0.9 Coverage Probability N = 2 14, m = 4 N = 2 22, m = 4 N = 2 14, m = 16 N = 2 22, m = 16 N = 2 14, m = 32 N = 2 22, m = 32 0.0 0.2 0.4 0.6 0.8 log N(S)

Summary for simulation of β(τ) m increases, S decreases N increases, S gets close to N 1/2 ε N, coverage is symmetric in τ, ε Exp, coverage is asymmetric in τ t distribution behaves similarly to normal distribution

Quantile projection setting B: cubic B-spline with q = dim(b) defined on G = 4 + q equidistant knots on [τ L, τ U ]. We require K > q so that β(τ) is computable (see (2.4)) N = 2 14 y 0 = Q(x 0 ; τ) so that F Y X (y 0 x 0 ) = τ From Theorem 3.2, the coverage probability with size α = 0.95 is P { τ [ FY X (Q(x 0 ; τ) x 0 ) ± N 1/2 τ(1 τ)x 0 Σ 1 X x 0Φ 1 (1 α/2) ]}

F Y X (y x), ε N (0, 0.1 2 ), m = 4 for β(τ) Normal(0, 0.1 2 ), m= 4, y 0 = Q( x 0 ; τ = 0.1 ) Normal(0, 0.1 2 ), m= 4, y 0 = Q( x 0 ; τ = 0.5 ) Coverage Probability q = 7, K = 20 q = 7, K = 30 q = 10, K = 20 q = 10, K = 30 q = 15, K = 20 q = 15, K = 30 Coverage Probability q = 7, K = 20 q = 7, K = 30 q = 10, K = 20 q = 10, K = 30 q = 15, K = 20 q = 15, K = 30 0.0 0.2 0.4 0.6 0.8 log N(S) 0.0 0.2 0.4 0.6 0.8 log N(S) Normal(0, 0.1 2 ), m= 4, y 0 = Q( x 0 ; τ = 0.9 ) Coverage Probability q = 7, K = 20 q = 7, K = 30 q = 10, K = 20 q = 10, K = 30 q = 15, K = 20 q = 15, K = 30 0.0 0.2 0.4 0.6 0.8 log N(S)

F Y X (y x), ε N (0, 0.1 2 ), m = 32 for β(τ) Normal(0, 0.1 2 ), m= 32, y 0 = Q( x 0 ; τ = 0.1 ) Normal(0, 0.1 2 ), m= 32, y 0 = Q( x 0 ; τ = 0.5 ) Coverage Probability q = 7, K = 20 q = 7, K = 30 q = 10, K = 20 q = 10, K = 30 q = 15, K = 20 q = 15, K = 30 Coverage Probability q = 7, K = 20 q = 7, K = 30 q = 10, K = 20 q = 10, K = 30 q = 15, K = 20 q = 15, K = 30 0.0 0.2 0.4 0.6 0.8 log N(S) 0.0 0.2 0.4 0.6 0.8 log N(S) Normal(0, 0.1 2 ), m= 32, y 0 = Q( x 0 ; τ = 0.9 ) Coverage Probability q = 7, K = 20 q = 7, K = 30 q = 10, K = 20 q = 10, K = 30 q = 15, K = 20 q = 15, K = 30 0.0 0.2 0.4 0.6 0.8 log N(S)

F Y X (y x), ε Exp(0.8), m = 4 for β(τ) Exp (0.8), m= 4, y 0 = Q( x 0 ; τ = 0.1 ) Exp (0.8), m= 4, y 0 = Q( x 0 ; τ = 0.5 ) Coverage Probability q = 7, K = 20 q = 7, K = 30 q = 10, K = 20 q = 10, K = 30 q = 15, K = 20 q = 15, K = 30 Coverage Probability q = 7, K = 20 q = 7, K = 30 q = 10, K = 20 q = 10, K = 30 q = 15, K = 20 q = 15, K = 30 0.0 0.2 0.4 0.6 0.8 log N(S) 0.0 0.2 0.4 0.6 0.8 log N(S) Exp (0.8), m= 4, y 0 = Q( x 0 ; τ = 0.9 ) Coverage Probability q = 7, K = 20 q = 7, K = 30 q = 10, K = 20 q = 10, K = 30 q = 15, K = 20 q = 15, K = 30 0.0 0.2 0.4 0.6 0.8 log N(S)

F Y X (y x), ε Exp(0.8), m = 32 for β(τ) Exp (0.8), m= 32, y 0 = Q( x 0 ; τ = 0.1 ) Exp (0.8), m= 32, y 0 = Q( x 0 ; τ = 0.5 ) Coverage Probability q = 7, K = 20 q = 7, K = 30 q = 10, K = 20 q = 10, K = 30 q = 15, K = 20 q = 15, K = 30 Coverage Probability q = 7, K = 20 q = 7, K = 30 q = 10, K = 20 q = 10, K = 30 q = 15, K = 20 q = 15, K = 30 0.0 0.2 0.4 0.6 0.8 log N(S) 0.0 0.2 0.4 0.6 0.8 log N(S) Exp (0.8), m= 32, y 0 = Q( x 0 ; τ = 0.9 ) Coverage Probability q = 7, K = 20 q = 7, K = 30 q = 10, K = 20 q = 10, K = 30 q = 15, K = 20 q = 15, K = 30 0.0 0.2 0.4 0.6 0.8 log N(S)

Summary for simulation of F Y X (y x) Increase in m lowers both S and q (the critical point in q for the oracle rule) Either S > S or q < q (q = G 4) leads to the collapse of the oracle rule Increase in q and K improves the coverage probability For ε N, coverage is no longer symmetric in τ

Thank you for your attention

References I Chao, S., Volgushev, S., and Cheng, G. (2016). Quantile process for semi and nonparametric regression models. ArXiv Preprint Arxiv 1604.02130. White, T. (2012). Hadoop: The Definitive Guide. O Reilly Media / Yahoo Press.

Assumption (A): data (X i, Y i ) i=1,...,n form triangular array and are row-wise i.i.d. with Linear models (A1) Assume that Z i ξ m <, where ξ m = O(N b ) is allowed to increase to infinity, and that 1/M λ min (E[ZZ T ]) λ max (E[ZZ T ]) M holds uniformly in n for some fixed constant M. (A2) The conditional distribution F Y X (y x) is twice differentiable w.r.t. y. Denote the corresponding derivatives by f Y X (y x) and f Y X (y x). Assume that f := sup y,x uniformly in n. (A3) Assume that uniformly in n. f Y X (y x) <, f := sup f Y X (y x) < y,x 0 < f min inf inf f Y X(Q(x; τ) x) τ T x

E [ G(τ)G(τ ) ] (5.1) = Z(x 0 ) J m (τ) 1 E [ Z(X)Z(X) ] J m (τ ) 1 Z(x 0 )(τ τ ττ ). Process: linear model K(τ 1, τ 2 ) := Z(x 0 ) 2 Z(x 0 ) J 1 m (τ 1 )E[ZZ ]J 1 m (τ 2 )Z(x 0 )(τ 1 τ 2 τ 1 τ 2 ) Process: local polynomial

β 3 = (0.21, 0.89, 0.38) ; β 15 = (β 3, 0.63, 0.11, 1.01, 1.79, 1.39, β 31 = (β 15, 0.21, β 15). β(τ) = (0.21 + 0.1 Φ 1 σ=0.1 (τ), β m 1 ) 0.52, 1.62, 1.26, 0.72, 0.43, 0.41, 0.02) ; (5.2) f ε,τ > 0 is the height of the density of ε i evaluated at ε i s τ quantile. Simulation setting