SMOOTHING QUANTILE REGRESSIONS

Similar documents
LECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION

4 Conditional Distribution Estimation

Nonparametric regression: minimax upper and lower bounds

Notes On Nonparametric Density Estimation. James L. Powell Department of Economics University of California, Berkeley

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Lecture 9: Regression: Regressogram and Kernel Regression

Kernel density estimator

Sequences and Series of Functions

ON LOCAL LINEAR ESTIMATION IN NONPARAMETRIC ERRORS-IN-VARIABLES MODELS 1

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Regression with an Evaporating Logarithmic Trend

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

LECTURE 8: ASYMPTOTICS I

7.1 Convergence of sequences of random variables

Semiparametric Mixtures of Nonparametric Regressions

6.3 Testing Series With Positive Terms

Rank tests and regression rank scores tests in measurement error models

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

lim za n n = z lim a n n.

Optimally Sparse SVMs

Uniformly Consistency of the Cauchy-Transformation Kernel Density Estimation Underlying Strong Mixing

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Distribution of Random Samples & Limit theorems

DEGENERACY AND ALL THAT

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 15: Density estimation

Notes On Median and Quantile Regression. James L. Powell Department of Economics University of California, Berkeley

Empirical Processes: Glivenko Cantelli Theorems

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT

Singular Continuous Measures by Michael Pejic 5/14/10

Detailed proofs of Propositions 3.1 and 3.2

Estimation for Complete Data

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Chapter 6 Infinite Series

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

ALLOCATING SAMPLE TO STRATA PROPORTIONAL TO AGGREGATE MEASURE OF SIZE WITH BOTH UPPER AND LOWER BOUNDS ON THE NUMBER OF UNITS IN EACH STRATUM

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

A survey on penalized empirical risk minimization Sara A. van de Geer


Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Rates of Convergence by Moduli of Continuity

Brief Review of Functions of Several Variables

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

1 Approximating Integrals using Taylor Polynomials

7.1 Convergence of sequences of random variables

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Math 61CM - Solutions to homework 3

32 estimating the cumulative distribution function

s = and t = with C ij = A i B j F. (i) Note that cs = M and so ca i µ(a i ) I E (cs) = = c a i µ(a i ) = ci E (s). (ii) Note that s + t = M and so

CONCENTRATION INEQUALITIES

Appendix to: Hypothesis Testing for Multiple Mean and Correlation Curves with Functional Data

Math 341 Lecture #31 6.5: Power Series

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

REGRESSION WITH QUADRATIC LOSS

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Lecture 7 Testing Nonlinear Inequality Restrictions 1

Lecture 19: Convergence

Chapter 2 The Monte Carlo Method

Seunghee Ye Ma 8: Week 5 Oct 28

INEQUALITIES BJORN POONEN

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Lecture 27: Optimal Estimators and Functional Delta Method

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Chapter 8. Euler s Gamma function

Math Solutions to homework 6

Lecture 20: Multivariate convergence and the Central Limit Theorem

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number

Statistical Inference Based on Extremum Estimators

Chapter 8. Euler s Gamma function

Lecture 33: Bootstrap

1 Introduction to reducing variance in Monte Carlo simulations

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

Lecture 3 The Lebesgue Integral

Expectation and Variance of a random variable

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

This section is optional.

INFINITE SEQUENCES AND SERIES

b i u x i U a i j u x i u x j

SOLUTIONS TO EXAM 3. Solution: Note that this defines two convergent geometric series with respective radii r 1 = 2/5 < 1 and r 2 = 1/5 < 1.

Matrix Representation of Data in Experiment

Estimation of the essential supremum of a regression function

1 Convergence in Probability and the Weak Law of Large Numbers

Random Variables, Sampling and Estimation

Chapter 10: Power Series

x x x 2x x N ( ) p NUMERICAL METHODS UNIT-I-SOLUTION OF EQUATIONS AND EIGENVALUE PROBLEMS By Newton-Raphson formula

Efficient GMM LECTURE 12 GMM II

Web-based Supplementary Materials for A Modified Partial Likelihood Score Method for Cox Regression with Covariate Error Under the Internal

2 Banach spaces and Hilbert spaces

Precise Rates in Complete Moment Convergence for Negatively Associated Sequences

Estimating the Population Mean using Stratified Double Ranked Set Sample

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Quantile regression with multilayer perceptrons.

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Math 2784 (or 2794W) University of Connecticut

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

Solution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1

Transcription:

SMOOTHING QUANTILE REGRESSIONS Emmauel Guerre Marcelo Ferades Eduardo Horta Scool of Ecoomics ad Fiace Scool of Ecoomics ad Fiace Departmet of Statistics Quee Mary Uiversity of Lodo Quee Mary Uiversity of Lodo UFRGS e.guerre@qmul.ac.uk m.ferades@qmul.ac.uk eduardo.orta@ufrgs.br ABSTRACT: Te paper proposes a ew smooted versio of te quatile regressio estimator. We sow tat smootig te objective fuctio improves te efficiecy of quatile regressio estimatio relative to smootig oly te ceck fuctio. Its secod-order mea squared error is studied ad sow to be smaller ta te oe of Koeker ad Bassett 978 quatile regressio estimator we a optimal badwidt is used. Te paper proposes a data-drive coice of te badwidt troug cross-validatio. A simulatio experimet reveals tat te improvemet ca be substatial wit a Mea Squared Error reductio as ig as 4% compared to te usual quatile regressio estimator of Koeker ad Bassett 978. JEL CLASSIFICATION: C4. KEYWORDS: Baadur approximatio, data-drive badwidt, kerel estimatio, quatile regressio. ACKNOWLEDGEMENTS:

. INTRODUCTION 2. THE OTHER SMOOTHED QUANTILE REGRESSOR ESTIMATOR I wat follows. we restrict attetio to te media estimatio. Tis is just for simplicity i tat it is straigtforward to exted te discussio to ay oter quatile. Defie te media estimator as te miimizer of H b Y i b K Yi b, 2 were K t t Ku du. Te first ad secod derivatives of te are H b H 2 b { Y i b K Y i b + K Yi b }, 2 { Yi b 2 K Yi b } + 2 K Y i b. 2.. Te secod-order term. Observe tat t k K t dt k t k Kt dt for ay iteger k as log as bot itegrals exist. It te olds tat y b y b E H2 b 2 K fy dy + 2 K y b fy dy t K t fb + z dz + 2 Kz fb + z dz fb + O s + 2 fb + O s fb + O s. 2.2. Te first-order term. Te variace of te first-order term satisfies var H b var K Yi b It turs out tat var K Yi b + var Y i b K Y i b + 2 cov Y i b K Y i b, K Yi b { F b F b 2 } K z K z dz + o, 2

wereas var Y i b K Y i b fb cov Y i b K Y i b, K Yi b fb u 2 K 2 u du + o, zkzk zfb + z dz K z K z dz + o give tat z Kz K z dz K z K z dz. It te follows tat { } var H 2 b F b F b 2 K z K z dz + o + 2 fb K z K z dz + fb u 2 K 2 u du + o. 3. SMOOTHING QUANTILE REGRESSIONS Cosider a radom sample {Y i, X i, i,..., } of Y, X R R d + suc tat Y i X iβu i X i β U i + + X id β d U i, were U i is a, -uiform radom variable idepedet of X i. Let F y x P Y y X x deote te coditioal cumulative distributio fuctio of Y give X. Assumig tat te mappigs u β j u for j,..., d are cotiuous ad icreasig over te uit iterval, te coditioal quatile fuctio is te uiquely defied by Qτ x F τ x x βτ. Koeker ad Bassett 978 estimate te quatile regressio by βτ argmi ρ τ Yi X b R d ib, 2 wit ρ τ v v τ Iv <. Te stadard approac i te literature is to smoot te ceck fuctio ρ τ by replacig te idicator fuctio I v < wit a kerel-based couterpart see Amemiya, 982; Horowitz, 998. We take a differet route by usig a smooted estimator of te cumulative distributio of deviatio V i b Y i X i b rater ta te empirical distributio i 2. We sow tat tis etails a more efficiet smootig of te objective fuctio ta just smootig te ceck fuctio. 3

Let F t; b IV ib t ad F t; b t f v; b dv, were f v; b is te kerel desity estimator, wit badwidt >, give by f v; b K v t d F t; b wit Kt dt. Itegratio by parts gives way to ρ τ V i b ρ τ v d F v; b τ τ F v; b dv + τ K v V i b, v d F v; b + τ F v; b dv. We propose smootig te objective fuctio by replacig F t; b wit F t; b, amely, Rb; τ, τ F v; b dv + τ Te resultig smooted quatile regressio estimator te is F v; b dv. v d F v; b βτ; arg mi b R d R b; τ,. 3 It is easier to appreciate te motivatio for smootig directly te objective fuctio by lookig at te first-order coditio, i.e., te score fuctio R b; τ, b Rb; τ,. It follows from te defiitios of F ad f tat R b; τ, τ τ F t; b dt τ F t; b dt b b X i K t V i b dt τ X i K t V i b dt { } X X i τk i b Y i X τ K i b Y i X X i K i b Y i τ. If d ad X i, solvig R b; τ, reduces to τ F b τ F b, 4 wit F b K b Yi or, equivaletly, F b τ. Tis meas tat te solutio of 4 is te well-kow Nadaraya s 964 smooted quatile estimator. Tis also illustrates te fact tat R b; τ, is 4

aalogous to a smooted cumulative distributio fuctio. Tis is i cotrast wit te smootig procedure put fort by Horowitz 998, wose first-order coditio ivolves a quatity aalogous to a kerel-based estimator of te probability desity fuctio. As a result, te secod-order derivative of our smooted quatile regressio estimator R 2 b; τ, is similar to a kerel desity estimator, wereas Horowitz s 998 ivolves terms tat are similar to a kerel estimator of te derivative of a probability desity fuctio. Tis esures tat our estimator as better ig-order properties ta Horowitz s 998 give tat te kerel desity estimator coverges at a faster rate relative to te kerel derivative estimator. Our mai assumptios are as follows. Deote te coditioal probability desity fuctio of Y give X x by fy x d x j β j Q y x j ad let f j y x deote its jt derivative j y j fy x. Let also s s deote te lower iteger part of ay positive real umber s, tat is to say, te uique iteger umber satisfyig s s < s. Defie X ad Y R as te port of X ad Y. Assumptio Q Te coditioal probability desity fuctio fy x is cotiuous over X Y, wit fy x > for all x, y X Y. Te coditioal quatile fuctio y Qy x is strictly icreasig for all x X. Assumptio D Tere exist some s, L > suc tat x,y f j y x L, wit lim y ± f j y x for j,..., s, ad f s y x f s y x L y y s s for all x X ad y, y Y. Assumptio X Te covariates X X,..., X d are almost surely positive, bouded ad suc tat D EXX as full rak. Assumptio K Te kerel K is symmetric, cotiuous ad twice differetiable wit Ky dy <, Ky dy, K y dy <, K 2 y dy <, ad < For s as i Assumptio Q, y s+ Ky dy < ad yky dy K y K y dy <. y s+ Ky dy. 5

Assumptio H Te badwidt is i te iterval, wit / o/ l ad o. Remarks a Commets o Assumptio D ad te Laplace distributio 2 exp x for wic s eve if tere is o derivative. b Ay symmetric o-egative kerel automatically meets te coditio K y K y dy > i Assumptio K. Te latter is oterwise fairly geeral, requirig oly tat te kerel fuctio is moderately large i te tails so as to esure tat K y remais iferior to oe. Before establisig te mai results, it is ecessary to itroduce some furter otatio. Let ad deote te cadidate limit of R 2 b; τ, by R 2 b; τ, 2 Rb; τ, b b D τ E XX f X βτ X. Assumptios Q, D ad X te esure tat D τ exists for all τ. Let deote te Euclidea orm of a vector or a matrix, amely, V trv V. 4. MAIN RESULTS Altoug tis paper ougt to illustrate te positive aspects of smootig, ay reader familiar wit oparametric approaces is probably already aware tat te beefits expected from tis teciques comes wit potetially importat drawbacks. Ideed, te smooted quatile regressio estimator β τ; sould ot be viewed as a estimator of βτ, a approac wic would amout to igore te impact of smootig. It is more oest to iterpret β τ; as a estimator of β τ; arg mi E Rb; τ,, b R d a poit of view tat ackowledges tat β τ; ca be a biased estimator of βτ. Te ext result studies te order of te bias term β τ; βτ. THEOREM BiasQ. Give Assumptios Q, X, K, β τ; is uiquely defied for all τ τ, τ ad satisfies, uiformly wit respect to τ, τ, τ,, βτ; βτ + O L s+. 6

Additioally, if s is a iteger umber ad y fy x is st cotiuously differetiable for all x, te followig expasio olds βτ; βτ s+ Bτ { + o}, 5 were Bτ z fx βτ x xx s+ Kz dz fx dx s +! E Xf s X βτ X. Let us ow tur to te positive improvemets iduced by smootig. Te ext teorem deals wit Baadur represetatios for βτ; βτ;, tat is, te approximatio of tis quatity by a stadardized sum. THEOREM 2 Baa. Give Assumptios K, Q ad X, βτ; is uique for τ, τ, τ, wit probability tedig to ad satisfies βτ; βτ; R2 βτ; ; τ, R βτ; ; τ, l + O P 3/2 + s /2 D τ R βτ; ; τ, +O P l 3/2 + s /2 + l /2 + L s 6 7 uiformly wit respect to τ, τ, τ,, were R2 τ, τ,τ, βτ; ; τ, ad τ, τ,τ, R βτ; ; τ, are bot of order O P. Tese liear approximatios for βτ; βτ; differ oly because tey employ distict stadardizatios. Te latter is te radom sum R 2 βτ; ; τ, i 6, wereas 7 uses istead its limitig couterpart D τ. As a cosequece, te remaider term of 7 icludes a additioal term l / /2 + s relative to 6, correspodig to te order of R2 τ τ,τ βτ; ; τ, D τ. As usual, tere is a factor l because of te uiform ature of te result. Note tat bot approximatios ivolve te term R βτ; ; τ,, wic is cetered give tat E R βτ; ; τ, b E bβτ; Rb; τ,. 7

Koeker 25 discusses similar approximatios for te regressio quatile estimator βτ βτ. See page 23 as well as refereces terei. It is well kow tat te best possible error for te correspodig remaider term is /4 up to iessetial logaritmic factors. Tis cotrasts wit te remaider term of 6 wic is at worst of order /4 provided tat s > /2 ad remais larger ta l /3 / /2. Ideed, te remaider term 6 acieves te order /2 wic is typical of estimators wic miimize a smoot criterio fuctio provided s ad assumig tat te order of remais larger or equal ta l / /3. Te study of te remaider term of 7 is more complex due to te additioal term l / /2 + s. Coosig a badwidt proportioal to /2s+, wic is optimal i view of te criterio cosidered i Teorem 3 below, gives tat te order of l / /2 + s is l / s/2s+ wic is always larger ta /2 but is smaller ta /4 provided s > /2. Teorem 2 is also a crucial step for te study of te asymptotic properties of te smooted quatile regressio estimator. Te liear approximatio 7 sows tat te asymptotic properties of β; τ are give by te leadig term βτ; D τ R βτ; ; τ,. I wat follows, we focus o te optimal coice of te badwidt we estimatig a liear combiatio λ βτ. Tis icludes i particular estimatio of eac of te coefficiets β j τ by a proper coice of te vector of λ. Defie te Asymptotic Mea Square Error of λ βτ as AMSEλ β; τ E λ βτ; D τ 2 R βτ; ; τ, βτ. Tis quatity is a proxy for te mea square error MSEλ β; τ E λ β; τ βτ 2. Studyig te AMSE istead of te MSE amouts to eglectig te remaider term of 7, wic sriks to. Te ext result describes a optimal badwidt coice wit respect to te AMSE criterio. For te quatile estimator 4, Ralescu 996 gives a liear approximatio wit a remaider of order /4 up to a logaritmic term witout restrictio o s > but provided tat te badwidt is larger ta l l / /4, suggestig tat te order of 6 is too pessimistic for small s. Ideed our proof teciques are cetered o larger values of s for wic a liear approximatio ca old wit a better order ta /4. 8

THEOREM 3 VarQ. Give Assumptios Q, D, X, K, ad H we ave uiformly wit respect to τ, τ, τ,, D var τ R βτ; ; τ, τ τd τd D τ { } 2 K y K y dy + O +s. 8 If te expasio i 5 olds, AMSEλ β; τ is asymptotically miimal if + oopt, were opt as log as λ Bτ. I tis case, λ D τd D τλτ τ K y K y dy s + λ Bτ 2 2s+ AMSEλ λ β; D τ τd D τλτ τ { 2s + } 2 opt K y K y dy + o 2s 2s+. 2s + 2 A key poit i Teorem 3 is te fact tat te variace expasio i 8 icludes a egative term, amely, 2 K y K y dy. Tis implies tat te asymptotic variace of te smooted βτ; is smaller ta te oe of te quatile regressio estimator βτ. As already oticed by Azzalii 98 for stadard quatile estimatio, it te follows tat te AMSE of λ βτ; ca be made smaller ta te oe of λ βτ. I oter words, smootig te loss fuctio ρ τ as i 3 improves te stadard regressio quatile estimator. I te case of quatile estimatio, a fudametal reaso for suc a improvemet is tat te smoot estimator of te cumulative distributio fuctio domiates te sample cumulative distributio fuctio as sow i Reiss 98. Uder Assumptios Q ad D, te optimal badwidt is of order /2s+ as te oe wic would be used to estimate te ucoditioal probability desity fuctio of Y. Te order of te secod-order variace term wit is /2s+ / 2s+2/2s+ wic is close to te first-order variace term λ D τd D τλτ τ/ we s is reasoably large, beig for istace 6/5 if s 2. Teorem 3 gives more detailed expasio i te case were y fy x is s-t times cotiuously differetiable. Te expressio of te optimal badwidt opt resembles te oe give i Lio ad Padgett 99 for estimatio of te cumulative distributio fuctio, see also Reiss 98. 9

5. A DATA-DRIVEN BANDWIDTH CHOICE Defie β λ τ; λ βτ; ad Bλ β λ τ; β λ τ; for λ R d. For a badwidt b, let D X ix i, D R 2 βτ; b; τ, b, D τ τ D D D, ad D λ λ Dλ. Let also H {,,..., K }, wit k k. Cosider te data-drive badwidt ĥ arg mi H Q;, wit Q; B 2 λ 4 Dλ K y K y dy, for. We propose to compute te badwidt by ĥ ĥ+, leadig to te fully data-drive estimator β λ τ β λ τ; ĥ. Let argmi H Q;, were Q; B 2 λ 2 D λ τ β λ τ; λ βτ;, ad B λ β λ τ; β λ τ;. K y K y dy, Assumptio B: Te badwidt b is suc tat b o ad /b o/ l, wereas te badwidts ad K respectively satisfy C /2 l ad K o, wit Kl. I additio, > is suc tat wit meetig te followig coditios. l l l. Tere also exist ukow s > /2 ad L > L > B B λ / Ls + s, L s+ B λ, ad is uiquely defied i te limit wit /2s++. B2 For all ɛ >, lim if mi 2s/2s++ H; / ɛ Q; Q ; >. THEOREM 4 DD. Uder Assumptios B ad K, te data-drive smooted quatile regressio estimator is suc tat, for +, βτ βτ Z + o P, were EZ ad varz { 2 } K y K y dy τ τd τd D τ. Note tat tere is o bias term above because βτ; βτ o P.

Lettig Rb; τ, E Rb; τ, yields 6. PROOFS Rb; τ Rb; τ, E ρ τ Y X b x b τ F v x dv + τ wose first- ad secod-order derivatives are respectively x b F v x dv fx dx, R b; τ R 2 b; τ F x b x τ x fx dx, fx b xxx fx dx. Let also deote S b,τ, te set R d,, to wic b, τ, belog. 6.. Prelimiary results. LEMMA BiasRF. Assumptios K, D ad X esure tat R 2 b; τ, R 2 b; τ b,τ, S b,τ, L s O, R b; τ, R b; τ b,τ, S b,τ, L s+ O, Rb; τ, Rb; τ b,τ, S b,τ, maxl, s+ O, R 2 max j j 2 b + δ; τ, R 2 j j 2 b; τ, j,j 2 d b,τ, S b,τ, L δ s O. Proof of Lemma. We ave Rb; τ, τ + τ τ + τ { v E { v E K t Y X b } dt E K t Y X b } dt dv, dv { } v+x b E E K t Y X dt dv { E v+x b K t Y X } dt dv.

It te follows from te Lebesgue Domiated Covergece teorem tat { R b; τ, E τ XE K v + X b Y X dv τ E τ X b X K v Y dv τ X b X K v Y dv R 2 b; τ, τe XX K X b Y + τe XX K X b Y E XX K X b Y. XE K v + X b Y X Assume from ow o tat s give tat te case s is trivial. Uder Assumptio D, a Taylor expasio wit itegral remaider gives s fv + z x f l v x l zl l! + zs s! Usig a cage of variables y t + x b + z yields uder Assumptio K E K t Y X b X x ft + x b x f s v + wz x w s dw. K t y x bfy x dy ft + x b x Kzft + x b + z x ft + x b x dy, } dv It te follows from Assumptio D tat E K X b Y { f s t + x b + wz x f s t + x b x z s fx b xfxdx C s z s Kz dz. s! Kz d Tis yields te first stated result. We ow tur our attetio to te stated results for R ad R. Itegratig 9 wit b yields x b { } E K t Y X x ft x dt { } z f s x b + wz x f s x b x s Kz dz w s dw s! { } z f s x b + vwz x f s x s+ b x dv Kz dz w w s dw s! C L s+, 2

wereas v+x b { } E K t Y X x ft x dt { } z f s 2 v + x b + wz x f s 2 v + x s b x s! Kz dz w s dw C maxl, s+. Tis implies tat { R b; τ, τf x b x τ F x b x } x fx dx CLs+, + } { Rb; τ, τ F v + x b x dv + τ F v + x b x dv fx dx CL s+ for all τ,. Observe ow tat R 2 j j 2 b; τ, E X j X j2 K X b Y x x 2 Kz fx b + z x fx dx dz. It te follows tat R 2 j j 2 b + δ; τ, R 2 j j 2 b; τ, uiformly i b,, δ, τ, j ad j 2. L x x 2 Kz fx b + x δ + z x fx b + z x fx dx dz x x 2 Kz x δ s fx dx dz CL δ s, LEMMA 2 StocR. Give Assumptios K ad X, max b,τ, R d,, b,τ, R d,, j,j 2 d b,τ, R d,, were, for te latter R b; τ, R b; τ, O P, R2 b; τ, R 2 b; τ, l / l l O P, 2 R j j 2 b; τ, R 2 j j 2 b; τ, δ l / l l O P, R 2 j j 2 b; τ, R 2 j j 2 b + δ; τ, R 2 j j 2 b; τ,, R 2 j j 2 b; τ, R 2 j j 2 b + δ; τ, R 2 j j 2 b; τ,. Proof of Lemma 2. Recall tat 3

R b; τ, { τ t Yi X i X i K b } dt τ X X i K i b Y i τ. + { t Yi X i X i K b } dt Cosider te classes of fuctios { x F j x, y x j K b y ; b, R d, }, j,..., d. Sice K < ad X j is bouded, Lemma 22-ii i Nola ad Pollard 987 esures tat F j is Euclidea wit a square itegrable evelope C X j. Te te remark below Defiitio 8 i Nola ad Pollard 987, Propositio ad iequality p. 39 i Eimal ad Maso 25 imply tat { } X X ji K i b Y i X E X ji K i b Y i O P. b, R d, Te te expressio of R b; τ, give above sows tat R b; τ, R b; τ, O P. b,τ, R d,, Te proof of te result for R 2 follows te same steps ta te proof of Teorem i Eimal ad Maso 25, usig K < ad Assumptio K2. Te proof of te result for R 2 follows te same steps, usig K < ad X j X j2 X b + δ Y K δ X b Y K to boud te summads i R 2 ad to compute teir variace X j X j2 X δ X K b + wδ Y dw δ C X K b + wδ Y dw 6.2. Proof of Teorem. Let βτ; be ay miimizer of Rb; τ,. Lemma sows, sice β τ arg mi b R b; τ R βτ; ; τ R βτ; τ if b Rb; τ, mi R βτ; τ + R βτ; ; τ R βτ; ; τ, O max L, s+, 4 b

uiformly i ad τ. Sice R βτ; τ, te Taylor formula wit itegral remaider gives R βτ; ; τ R βτ; τ βτ; βτ R 2 βτ + w βτ; βτ ; τ dw βτ; β τ. Sice R 2 b; τ f x b x xx fxdx, Assumptio Q ad X esures tat te miimal eigevalue of R 2 βτ + w βτ; βτ ; τ dw is larger ta C > for all τ τ, τ ad,. Hece substitutig i te iequalities above gives βτ; βτ max L, s+ /2 O.,τ, τ,τ We ow sow tat βτ; is uique, tat is Rb; τ, as a uique miimizer, provided tat is small eoug. Cosider te compact set C C ɛ { } b R d ; b j β j τ ɛ, β j τ + ɛ, j,..., p, ɛ > recallig tat β j icreases wit τ uder Q. Takig ɛ small eoug esures via Q ad X tat te miimal eigevalue of te defiite positive R 2 b; τ is larger ta a C > for all b C, τ τ, τ. Hece Lemma esures tat te maps b C Rb; τ,, τ τ, τ ad,, are all strictly covex provided is small eoug. Sice sows tat all te cadidate miimizers β τ;, τ τ, τ ad,, are i C for is small eoug, te strict covexity of te correspodig Rb; τ, esures tat all tese βτ; are uiquely defied. We ow improve. Sice Lemma gives tat R βτ; ; τ, adr βτ; τ, τ,,, τ,,, R β τ; ; τ R βτ; τ L s+ R β τ; ; τ, R βτ; ; τ L s+ O. 5

Now te Taylor formula, te expressio of R 2, Assumptio Q ad give R βτ; ; τ R βτ; τ R 2 βτ + w βτ; βτ ; τ dw βτ; βτ R 2 βτ; τ + O L βτ; βτ s β τ; βτ R 2 βτ; τ + O L max L, s+ s /2 βτ; βτ uiformly i τ,,,. Sice te LHS is O L s+, tis gives uiformly i τ,,, ad provided tat is small eoug, ad βτ; βτ + O L s+ is prove. Te proof of 5 similarly uses te expasios { } βτ; βτ + R 2 βτ; τ O L s+ R βτ; τ, R βτ; ; τ, R βτ; τ, { R 2 βτ; τ + O C} βτ; βτ, ad, usig s s +, ad w 2 w s dw/2 /ss + s + 2 R z s+ Kzdz βτ; τ, + o f s x βτ x fxdx, s +! so tat 5 olds. 6.3. Proof of Teorem 2. Lemmas ad 2 give sice τ,,, τ,,, b,τ, R d,, + o P. Sice τ,,, β R τ; ; τ R βτ; τ β R τ; ; τ R b; τ R b; τ, + R βτ; ; τ, R βτ; ; τ R βτ; τ b,τ, R d,, R b; τ, R b; τ, R 2 βτ + w βτ; βτ ; τ dw βτ; βτ 6

ad because te miimal eigevalue of R2 βτ + w b βτ ; τ dw is larger ta C > for all b ad τ τ, τ uder Assumptios Q ad X, tis gives tat τ, τ,τ, β τ; βτ o P, ad by Teorem, τ, τ,τ, β τ; βτ; o P. 2 Note te former sows tat it is possible to fid a compact B suc tat { βτ; }, τ, τ, τ, B wit a probability tedig to. Because Lemmas ad 2 imply tat all te fuctios b B R b; τ, are strictly covex wit a probability tedig to, it follows tat all te βτ;, τ, τ, τ, are uiquely defied wit a probability goig to. Sice R β τ; ; τ,, Lemmas 2, ad Teorem, we ave uiformly i τ, τ, τ,, R βτ; ; τ, R β τ; ; τ, R β τ; ; τ, R 2 β τ; + w βτ; β τ; ; τ, dw β τ; βτ; R 2 l /2 βτ; ; τ, + O P βτ; β τ; + O L β τ; βτ; s βτ; βτ;, 3 were te eigevalues of R 2 β τ; ; τ, are bouded away from ad ifiity wit a probability tedig to. Sice R βτ; ; τ,, Lemma 2 sows tat R βτ; ; τ, O P / uiformly. Hece applyig 3 ad 2 give, sice o/ l, 3/2 βτ; βτ; OP τ, τ,τ, l /2 ad, reapplyig 3, Applyig 3 agai gives βτ; β τ; τ, τ,τ, β τ; βτ; O P R 2 l /2 β τ; ; τ, + O P 3/2 + s /2 R βτ; ; τ, R2 βτ; ; τ, R l /2 βτ; ; τ, + O P 3/2 + s /2 7

uiformly wit respect to τ, τ, τ,. Tis gives te first Baadur represetatio 6 i Teorem 2. Te Baadur represetatio 7 te follows from Lemmas ad 2. 6.4. Proof of Teorem 3. We ave, sice E R βτ; ; τ,, var R β τ; ; τ, { X var X K β τ; Y E } X τ E XX K βτ; Y 2 τ X XX K β τ; Y 2 X 2τE XX K βτ; Y + τ 2 E XX. Asuumptios D ad K give, itegratig by parts ad usig Teorem, X E K βτ; Y X x x K βτ; y x fy x dy K βτ; y F y x dy F x βτ; x + F x βτ; z x F x β τ; x Kz dz F x βτ x + O s+ + O s+ τ + O s+, sice x βτ F τ x ad arguig as i Lemma.For te term ivolvig K 2, defie Kz 2KzK z d dz K z 2, wic is suc tat Kz dz lim z + K z 2. 8

Arguig as above ow gives X E K βτ; Y 2 X x x K βτ; y F y x dy τ + O s+ + F x βτ; z x F x βτ; x Kz dz τ f x βτ; x + O s zkz dz + O s+ τ f x βτ x + O +s s + O s zkz dz + O s+ τ f x βτ x zkz dz + O +s. Substitutig gives te variace expasio sice K z K z, zkz dz 2 zkzk zdz zd K z 2 + 2 + zd K z + K z 2 dz + + K z K z 2 dz 2 2 + K z dz 2 Te variace expasio ad 5 te gives AMSE λ +s βτ; g + o were g Sice g 2 2 + + + 2s+2, + K z K z dz. τ τ λ K z K D z dz τd D τλ + 2s+2 λ Bτ 2. K z K z dz τ τ λ D τd D solvig g gives te desired opt ad te correspodig AMSE expasio. τλ +2s + 2 2s+ λ Bτ 2, 7. WORK IN PROGRESS Defie for λ R d β λ τ; λ βτ;, Bλ β λ τ; β λ τ;. 9

Cosider a badwidt b. Let D D τ τ X i X i, D R 2 β τ; b ; τ, b, D D D, Dλ λ Dλ. Let H {,,..., K } were k k. A first data-drive badwidt is, for ad, Te proposed optimal badwidt is ĥ arg mi H Q ; were Q ; B 2 λ 4 Dλ ĥ ĥ+ + K y dy. leadig to te fully data-drive estimator β λ τ β λ τ; ĥ. 7.. Proof i progress. Set β λ τ; λ β τ;, B λ β λ τ; β λ τ; ad Q ; B 2 λ 4 D λ τ arg mi Q ;. H + K y dy, Assumptio B: Te badwidts, K ad K satisfy l /2 C, K o adk/ l. > is suc tat wit lim L > L > suc tat l l l +. Tere are some ukow s > /2 ad B: B λ / L s + s, L s+ B λ ad is uiquely defied asymptotically wit /2s++ ; B2: lim if 2s/2s++ mi H; / ɛ {Q ; Q ; } > for all ɛ >. Routie modificatios of Arcoes 996 give l βλ τ; β λ τ λ D τ R /4 βτ; τ, + O P, 2

see also He ad Sao 996, Remark 3.4. Hece 7 gives β λ τ; β /2 λ τ; B λ τ; λ were +O P l /3 3/4 + l/2 3/2 + /2 λ λ D τ R βτ; ; τ, R βτ; τ, λ D τ s /2+ + B λ /2 { X X i K i βτ; Y i I Y i X iβτ }. We ave E λ ad X E K β τ; Y I Y X βτ 2 X x + x βτ + + x βτ x K βτ; y x βτ βτ;/ x βτ βτ;/ + f x βτ x { x K βτ; y 2 fy x dy 2 fy x dy K u 2 f x β τ; u x du K u 2 f x βτ; u x du K u 2 du + + } 2 K u du uiformly i, ad x uder Assumptios B ad X. Hece, uiformly i,, { + } 2 var λ K u 2 du + K u du λ D τe XX f X βτ X D τλ. Te arguig as i Eimal ad Maso 25 gives λ τ, max O P. l /, 2

It follows tat, uiformly wit respect to H, β λ τ; β l / λ τ; B λ τ; + o P O P 3/4 +O P l /3 + l/2 3/2. /2 Observe ow tat te coditios o, ad K i Assumptio B esure tat { } { l / max H max l } { max exp l H H\{ } O exp l l l K l max H\{ } 3/4 l /3 / l/2 3/2 { 3/4 /2 } l /3 mi H\{ } max H\{ } l /3 l l l o K o, } 3/4 3/2 2 l /3 /2 l /6 O, 3/4 /2 /2 /2. l Tis gives uiformly wit respect to H B λ τ; β λ τ; β /2 λ τ; B λ τ; + o P + o P. Sice ad D λ D λ + o P, it follows tat max H max H B 2 λ τ; + / Q ; + / O Observe ow tat, sice /K o/ l ad usig B, Q ; Q ; Q ; + / o P. 4 C Q ; mi H {L 2 2s+2 C } C + o. Tis implies i particular tat Q ; Q ; + o P. Cosider ow some ɛ,. Uder B, we ave for all ɛ C Q ; L2 2s+2 C C ad 4 gives mi Q ; mi Q ; + o P Q ;. H; ɛ H; ɛ 22

It follows tat, uder B, B2 ad sice Q ; Q ; + o P, mi Q ; > mi Q ; H; ɛ H; / <ε wit a probability tedig to. Cosider ow + ɛ. Set κ 4D λ τ + K y dy so tat Q; Bλ 2 ; τ κ /. We ave for tose suc tat Q; κ/2 / Q ; + + 2 Q ;. κ If Q; < κ/2 /, te B 2 λ ; τ 3κ/2 / so tat B gives 3κ/2L 2 /2s++ implyig tat Q; CQ ;. Hece 4 gives mi Q ; + op mi Q ; + o P Q ;, H; ɛ H; ɛ wic togeter wit Q ; Q ; + o P ad B2 gives mi Q ; > mi Q ;. H; +ɛ H; / <ε Hece ĥ/ + o P wic also sows tat ĥ+ + o P. We ow tur to βτ β τ; ĥ, ĥ ĥ+. 7 gives, for η ĥ+ / + were + β λ τ β λ τ; + / + + /2 E λ η l /2 +O P + /2 E λ η λ D τ R β τ; η + { λ D τ X i 3/2 + ; τ, η + X K i β τ; η + s /2+ + η + + /2 s+ R β τ; + Yi ; τ, + X K i β τ; + + } Yi. 23

We first study te icremets of E λ. Assumptio B gives for all η, η 2 /2, 2, { E X + K β τ; η + Y X η + K β τ; η 2 + } 2 Y η 2 + X x η {K u K u + x β τ; η + η β τ; η 2 + η 2 } 2 f x β τ; η + η + u x du x η 2 βτ;η + η βτ;η 2 + /η 2 Cη Ku + v dv 2 du C η 2 β τ; η + η β τ; η 2 + 2 C η2 β τ; η + β τ; η2 + + η2 η β τ; η2 + 2 C η2 η 2. Hece va der Vaart ad Weller Corollary 2.2.5, 996 gives tat for ay ɛ > 2 E /2 E λ η 2 E λ η η,η 2 /2,2; η 2 η ɛ Cɛ /2. Tis gives E λ η O P ĥ + + /2 o P, ad, sice + l / /2, β λ τ β λ τ; + op + + /2 o P. /2 l /2 + O P + 3/2 + s /2+ + + /2 s+ Tis gives te desired result. REFERENCES AMEMIYA, T. 982: Two stage least absolute deviatios estimators, Ecoometrica, 5, 689 7. AZZALINI, A. 98: A ote i te estimatio of a distributio ad quatiles by a kerel metod, Biometrika, 68, 326 328. HOROWITZ, J. L. 998: Bootstrap metods for media regressio models, Ecoometrica, 66, 327 35. KOENKER, R. 25: Quatile Regressio. Cambridge Uiversity Press, Cambridge. KOENKER, R., AND G. BASSETT 978: Regressio quatiles, Ecoometrica, 46, 33 5. LIO, Y. L., AND W. J. PADGETT 99: A ote o te asymptotically optimal badwidt for Nadaraya s quatile estimator, Statistics & Probability Letters,, 243 249. NADARAYA, E. A. 964: Some ew estimates for distributio fuctio, Teory of Probability ad its Applicatios, 9, 497 52. RALESCU, S. S. 996: A Baadur-Kiefer law for te Nadaraya empiric-quatile processes, Teory of Probability ad its Applicatios, 4, 296 36. REISS, R.-D. 98: Noparametric estimatio of smoot distributio fuctios, Scadiavia Joural of Statistics, 8, 6 9. 24